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ABSTRACT 


The implementation of control systems using small-scale digital hardware 
has largely been a neglected Issue. However, in the field of digital signal pro- 
cessing a great deal of attention has been paid to the development of results 
concerning the finite-precision implementation of digital filters. In this thesis, we 
will use, adapt, and extend these ideas for digital feedback compensators. 
Specifically, we will primarily focus on steady-state linear-quadratlc-Gaussian com- 
pensators. For some of the issues Involved in compensator Implementation, the 
filtering results apply directly; thus we can use existing concepts. However, in 
many cases, 9t will prove necessary to adapt these results. Finally, In our Investi- 
gation we will uncover several extensions to the results as they apply to digital 
filters themselves. All three of these aspects are contributions to the develop- 
ment of digital control systems. 
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Chapter 1 1 Introduction 


The design of time-invariant discrete-time compensators through the use of 
optimal regulators, pole-placement concepts, observer theory, optimal filtering 
[1,2,3], and also via classical control methods [4] has received a great deal of 
attention In the literature. In principle, these mathematical design procedures 
result In a compensator whose parameters are exact, that is, of infinite precision. 
In practice, such parameters are of double precision. Such a (near) ideal compen- 
sator has typically been Implemented on large-scale floating-point computer sys- 
tems, where high speed and accuracy are assured. Expense has not really been 
an Issue. As a result, low-cost digital controllers have for the most part been 
quite simple, usually of the proportional-integral-derivative (PID) type [5]. 

The recent advances in digital hardware capabilities, such as the develop- 
ment of the microprocessor, have opened up many new applications for low-cost, 
real-time, small-scale digital control systems [6,6]. Thus the Issues that arise In 
Implementing compensators, that Is, In approximating them with small-scale digital 
systems, cannot be Ignored. Such Issues Include speed, finite memory limitations 
(finite precision), and expense. For its higher speed and lower cost, fixed-point 
arithmetic will be much preferred over floating-point (and assumed for this thesis). 
However, the effects of finite precision under fixed-point arithmetic are much 
worse than under floating-point. Such problems have not been addressed at all in 
the idealized mathematical design procedures that have been developed to date 
for control systems. 

Thesi Idealized design procedures wlj! result in an essentially Infinite- 
precision transfer function for the compensator. The term implementation will 
refer to (1), the selection of a structure — the specification and ordering of the 
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computations that take plane In the compensator during each sampling period, and 
•Iso to (2), the selection of a hardware architecture and components. In Imple- 
mehtlng an Ideal compensator, our aim Is to produce a finite-precision digital sys- 
tem which either performs as close to the Ideal as Is consistent with the expense 
and speed requirements of the application, or which meets a specific level of per- 
formance relative to the Ideal as Inexpensively as possible subject to certain 
speed (sampling-rate) constraints. It is important to note that the mathematical 
design procedure which produces the ideal compensator and the Implementation of 
this Ideal compensator are not necessarily independent procedures; the initial 
design assumes a specific sampling rate, yet the implementation is frequently 
quite Important In determining the maximum sampling rate. 

Some effort has been directed towards investigating the Issues Involved In 
Implementing digital feedback compensators, but it has been somewhat limited. 
Knowles and Edwards [7] and Curry [7.6] have each considered a roundoff noise 
analysis of certain sampled-data systems. Bertram [9], Slaughter [10], Johnson 
[11], and Lack [12] have developed amplitude bounds on the effects of quantiza- 
tion in sampled-data control systems. Srfpad [13] has looked in some depth at 
the roundoff noise and finite-precision coefficient performance of the discrete-time 
Kalman filter and linear-quadratic-Gausslan controller. Rink and Chong [14] have 
derived bounds on the effects of quantization errors In floating-point regulators. 
Farrar [16] has pointed out In a basic way some of the Issues Involved in Imple- 
menting continuous-time iinear-quadratic-Gaussian controllers as discrete-time 
fixed-point microprocessor-based systems. 

In his monograph, Willsky [16] has discussed a great number of parallels 
between the fields of digital signal processing and control and estimation. Many 
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of the basic issues Involved in implementing digits; feedback compensators have 
been examined in the context of digital signal processing, and a great many 
results exist. These digital filtering results are very Important for control applica- 
tions, since a digital control system can be viewed as a digital filter (the compen- 
sator) embedded In a feedback loop through a continuous-time plant. However, 
only In a few special cases do these results apply directly to control. Our task 
will be to use, adapt, and extend these results to the implementation of digital 
feedback compensators. In some cases we will directly use the filtering results. 
However, much of the time the control setting adds new twists to the Implementa- 
tion Issues, requiring the adaptation of existing results. This effort, bridging two 
disciplines, Is the most important contribution of the thesis. In addition, some of 
our work extends exists methods, or Introduces new approaches, that are also 
useful for digital filtering applications. This contribution, although limited In scope, 
can be valuable to researchers In digital signal processing. 

In this thesis, the steady-state iinear-quadratloGaussian (LQG) control 
problem will be selected to convey our ideas on the implementation of feedback 
compensators. This type of controller has been shown to have desirable perfor- 
mance properties In terms of its robustness, multivariate formulation, optimal na- 
ture, and so forth. The LQG problem has also received a great deal of attention 
In the recent literature, and Is being increasingly applied to real systems. Furth- 
ermore, the LQG problem has an explicit scalar objective function, which can be 
adopted as a performance metric against which the degradation due to finite 
wordlength effects can be measured. In fact, this was the degradation measure 
used by Sripad [13]. It Is not necessary to choose this performance metric, or 
even use an LQG framework, but such a choice allows us to develop our results In 
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a concrete setting. 5 fslng this LOG control framework in the context of a single* 
Input single-output system, we can bring out ail the Issues we wish to raise. 

In Chapter 2 the details of the dlscrete*tlme LOG problem under considera- 
tion will be presented. Specifically, we will consider a continuous-time plant which 
Is driven by additive white Gaussian noise and whose measured output Is also cor- 

i 

rupted by Gaussian noise and then sampled at rate y. The ideal discrete-time 

compensator will minimize an equivalent discrete-time; performance index, subject 
to a plecewlse-constant control signal u(t). In presenting the equations for this 
Ideal compensator, an Important point will be raised. The Unite calculation time Im- 
plicit In the arithmetic operations of the compensator Imposes & limit on the sam- 
pling rate of the system*. Due -to this same finite computation time, a. realistic 
compensator must have Its output at a given sample time depend only on past 
values of the compensator Input. The sample-skew approach to this problem, 
which involves sampling the compensator input and output at different times [1], 
will also be presented in Chapter 2, 

One of the important issues In discussing digital Implementations is the no- 
tion of a structure. Given the system sample rate, the effects of finite precision 
on performance are dependent on the structure chosen, and not on the architec- 
ture or components selected. If all compensator computations can be performed 
with Infinite precision, then all structures for implementing a given Ideal compensa- 
tor wilt be equivalent In performance. However, under the real constraint of finite 
precision, each structure will in general result In a different performance. Chapter 
3 will describe some different compensator structures. Two Important points will 
be stressed. First, the state space notation prevalent In control and estimation is 
not sufficient to represent all possible compensator structures. Second, the con- 
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ccspt of a structure for digital filters is not quite the same as the concept of a 
structure for digital compensators. This difference will require us to adapt the no- 
tation developed by Chan [1 7] for the representation of digital filter structures to 

the control case. A major implication of this change is that an n^-order LQG com- 
pensator (for an n tt} -order system) will have n+1 unit delay elements, and not n 

as In the case of n th - order filters. An important point will also be raised In 
Chapter 3 concerning the use of the ideal compensator equations resulting from 
the LQG design procedure as a computational algorithm; we can simply view this 
as one possible structure, what we will call the simple structure. We will show 
that, although this structure has been frequently used, more or less by default, it 
Is not usually a good choice due to its large number of coefficient multiplications. 

Architectural issues will be treated in Chapter 4. The ideas of seriallsm 
and parallelism, the degrees to which processes run sequentially or concurrently, 
will be presented in terms of the tradeoff they embody between compensator cal- 
culation time, which sets the maximum sample rate, and hardware complexity and 
expense. These ideas apply directly to digital compensators — no modifications 
are necessary. However the same cannot be said for the application of pipelin- 
ing to control systems. The use of pipelining, a method for Increasing the max- 
imum sampling rate and performance of a system by altering the structure and the 
resulting transfer function in a very specific way in order to Increase its inherent 
parallelism, raises an Important Issue — the interaction between the mathematical 
design of the ideal compensator and the finite-precision implementation of this 
Ideal. The application of pipelining produces additional series delay In a compen- 
sator. If Ignored, this delay will appear In a control system as extra negative 
phase shift, and perhaps cause Instability. The only way to account for this de- 
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lay accurately will be to augment the discretized plant model and redesign the 
Ideal compensator at the new sampling rate. Then If the same pipelining still ap- 
plies to the new higher-order ideal compensator, Improved performance can result. 

In Chapters 5, 6, and 7 we will consider the effects of the finite memory 
limitations of Inexpensive small-scale digital control systems. This restriction on 
memory will necessitate finite precision — the use of compensator coefficients 
(multipliers) of finite wordlength, and the insertion of quantization or overflow non- 
linearities following the compensator input. A/D converter and all multiplications 
(products) and additions. Methods must be found for selecting minimum coefficient 
and signal wordlengths which still result in acceptable levels of performance de- 
gradation; that Is, In small-enough increases in the performance index. 

Chapter 6 will treat the uncorrelated effects of product and A/D quantiza- 
tion on compensator performance. The major effort is spent on roundoff quantiza- 
tion, since the use of roundoff as opposed to sign-magnitude quantization results 
In lower levels of degradation, and also since roundoff effects can be analyzed In 
a tractable way. The main results of Chapter 5 are the adaptations of the scal- 
ing and roundoff noise analysis methods of digital filtering to the compensator 
case. There also arises an Important implication concerning set-point LQG 
configurations and the scaling issue. Finally, minimum roundoff noise compensator 
structures will be adapted from the work of Mullls and Roberts [18] on minimum 
roundoff noise filter structures. 

A sixth-order LQG control system will be introduced to test the roundoff 
analysis method of Chapter 5, and a number of different structures will be 
evaluated on the basis of their roundoff noise performance. We will show a 
significant similarity between the results for these structures and the results for 
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filter structures. However, two differences will arise. First, the potential pres- 
ence of many real poles In a feedback compensator will complicate the pairing Is- 
sue for parallel and cascade structures. Digital filters will typically have at most 
one real pole, so the pairing of such poles is of no Interest. Second, although the 
default simple structure will perform relatively well, there will be two structures 
with many fewer coefficients that perform even better. 

The effect of finite coefficient wordlength on performance is basically a 
deterministic one. Given any set of finite wordlength coefficients, we can com- 
pute exactly the resulting performance degradation, that is, the Increase In the 
performance Index. However, given a degradation level, It will be much harder to 
find the set of coefficients with the shortest wordlength that meets or exceeds 
this degradation level. If we make the common assumption that the Ideal values 
of the coefficients will be rounded to finite wordiengths, then the wordlength 
determination can be accomplished with repeated evaluations of performance, ont 
per wordlength tested. This procedure must also be repeated for each structure 
considered. Chapter 6 will describe the analytic methods developed for digital 
filters. Our emphasis will be on the use of a statistical measure of coefficient 
wordlength. For digital filters, this Involves the use of first-order sensitivities with 
respect to the coefficients of the structure. However, for LQG compensators, all 
the first-order sensitivities will be zero, due to the optimal nature of the problem. 
Thus we will develop two new statistical estimates using second-order sensitivi- 
ties. The necessity for second-order terms will exist for any parameter optimiza- 
tion problem, such as the sub-optimal reduced-order compensators described in 
Levine, Johnson, and Athans [19] and the sub-optimal decentralized controllers of 
Looze, Houpt, Sandell, and Athans [20]. In fact, If a digital filter Is designed to 
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minimize some differentiable scalar function, then second-order sensitivities must 
be used for any statistical wordlength estimate based on that function. This will 
Constitute an extension to the results for the implementation of digital filters. 

We will test the same sixth-order control system and structures with the 
analytical procedures developed for coefficient wordlength effects. Again, we will 
show the similarity between our results and and the filtering results, and demon- 
strate that other structures with far fewer coefficients perform better than the 
simple structure. The statistical estimates of wordlength will be compared to the 
exact wordlengths required to meet a specific degradation level. We will show 
that the major advantage In using the statistical estimates is not in the computa- 
tion time they may save over an Iterative deterministic method, but in the fact 
that they are continuous and differentiable in nature. This fact allows us to apply 
iterative gradient minimization techniques to compute minimum coefficient 
wordlength structures, as described In Chapter 8. In this procedure, the bulk of 
the computations for the statistical estimates need be performed only once. 

In Chapter 7, we will review the methods used in dealing with the correlat- 
ed effects of the quantization and overflow nonlinearities present in a structure 
[21]. Any system Including nonlinearities can exhibit oscillations, known as limit 
cycles. In digital filtering, there are three basic approaches to combatting such 
effects. First, we can use a structure that can be shown to have no limit cycles, 
given a specific type of nonlinearity. Second, the amplitude of any limit cycles 
can be upper bounded, allowing us to select a wordlength large enough to make 
this amplitude negligible. Finally, if a limit cycle occurs, we can . Inject enough 
roundoff noise to break up, or quench, the oscillation. Our results In this area for 
digital compensators are quite limited ; however, several observations will be 
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made. First, a control system with an open-loop unstable plant or a plant with an 
Integrator pole must of necessity have a low-amplitude limit cycle. Second, the 
global feedback loop around the compensator can alter the nature of any limit cy- 
cles that would occur In the open-loop compensator, and may even cause limit cy- 
cles. This point will be demonstrated for a finite Impulse response compensator. 
(A> Unite Impulse response filter is not recursive; therefore it can exhibit no limit 
cycles.) Finally, it Is not clear that limit cycles will occur at all In LQG systems, 
given the system driving poise and measurement noise that Is present. However, 
Jump phenomena and other correlated noise effects may occur. 

Chapter 8 will present a general iterative optimization technique for produc- 
ing minimum roundoff noise and minimum coefficient wordlength structures. This 
procedure has been adapted from the optimization method of Chan for digital 
filters [17]. Essentially, this technique allows one to select a structure with a 
predetermined number of coefficients and Iteratively vary those coefficients to 
minimize some scalar criterion. For LQG compensators, this criterion could be the 
Increase In J due to roundoff noise or the increase due to finite wordlength 
coefficients, or some combination of these two. For the minimization of roundoff 
effects, the modification to Chan’s procedure will be similar to the modification 
developed in Chapter 6 for roundoff analysis. However, the minimization of 
coefficient wordlength will require major changes since the statistical wordlength 
expression will actually be minimized, and this involves second-order sensitivities. 
The optimization procedure in Chapter 8 will also bring out two useful extensions 
for the digital filtering case. First, In minimizing roundoff noise effects, our pro- 
cedure will be more general than that of Chan, accounting for the exact number 
of roundoff error sources and the location of each one In the structure. This gen- 
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erallzatlon oan be easily added to Chan's method. Second, we will set forth some 
general approaches to selecting which portion of a given structure to optimize, 
that Is, the portion that will provide the greatest Improvement when optimized. 
(An unconstrained optimization of the entire structure usually results In too many 
coefficient multipliers.) These guidelines also will apply to digital filter structural 
optimization. 

Finally, Chapter 9 will review the contributions of this thesis, being careful 
to point out where our results are adaptations and applications of digital filtering 
techniques to the problem of implementing digital compensators, and where our 
results also constitute extensions to the digital filtering techniques. 
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Chapter 2: The LQG Problem 


A specific problem formulation is necessary to present In a unified manner 
the Issues involved In implementing digital compensators. Historically, control 
theory has developed two different approaches — classical control (primarily a 
frequency-domain approach) and modern control (primarily a time-domain approach). 
For this thesis effort, we have selected the Ilnear-quadratic-Gausslan (LQG) 
modern control problem for several reasons. The design of LQG systems has re- 
ceived a great deal of attention In recent times [3,22] due to Its advantages for 
Control (a multivariate nature, certain robustness properties [23], etcetera). As 
will be seen, the analysis of LQG compensators brings out all of the issues that 
we wish to discuss. Furthermore, the LQG problem has a very natural scalar ob- 
jective criterion for determining Its performance — the cost function J (defined 
below). Such an objective function makes it quite simple to measure the degrada- 
tion in performance resulting from any given compensator Implementation. The 
most common criticism of the LQG approach, the difficulty in selecting the parame- 
ters of J in some meaningful manner, is much less of a problem In light of the re- 
cent developments by Harvey and Stein [24] which relate frequency-domain 
design parameters to the selection of the scalar function J. This effort will thus 
help make the modern control approach more useful for small-scale low-cost digital 
systems. However, In principle the issues, approaches, and results developed 
here apply to any control and/or estimation implementation. This chapter will thus 
present the set of assumptions inherent in the LQG control problem and describe 
Its discrete-time solution. 

Consider a continuous-time plant whose performance Is to be Improved 
through feedback. Assume that the n^-order state space equations (2.1) and 
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(2.2) accurately model the input-output behavior of the plant, including any sensor 
and actuator dynamics: (Brackets will indicate continuous-time quantities, while 
parentheses will indicate discrete-time quantities,) 

x[t)-4>r[t] + flu[t] + w 1 [t] (2.1) 


y[l]«Cx[l] + i«f 2 [<3 (2.2) 

where the time-invariant system matrix A is nxn, the input gain matrix B Is 
nxm, and the output gain vector C is pxn. The n-vector *[t], m-vector u[t], and 
p-vector y[f ] represent the system states, Inputs, and outputs respectively. The 
e-vector iv^t) and />-vector Wg(0 represent uncorrelated white Gaussian noise 

sources of covariances and E 2 » where Eg > 0. It is further assumed that the 


performance of the system can be expressed as a scalar quantity which Is a qua- 
dratic function of the states and controls: 


J c~ E 


llm — 
r-t<» 2f 


/ (x'[t]Q x[t] + (/[t]ft </[t]) dt 


-r 


(2.3) 


where £ represents the expected value operation and the weighting matrices R 
and Q satisfy R > 0 and Q £ 0. Because of the time-averaging nature of the per- 
formance Index, this LQG problem is called the steady-state LQG problem [1], 

The control objective will be to minimize the index J c with a discrete-time 

linear compensator as shown In the configuration of figure 2-1, where the input 
u[t] is now piecewise constant. The solution to this problem involves discretizing 
the plant model and performance index, and then solving the resulting discrete- 
time LQG problem. Discretizing the equations (2.1M2.3) for a sampling period of T 
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Figure 2-1: LQG Configuration 

seconds produces: [1,26,26] 

x(A+1)* s #x(A) + r«/(A) + Hr 1 (^) (2.4) 

y{k)-Lxik) + w 2 (k ) (2.6) 


( 


J d~ £ 


llm 

/-no 


2 / 


J (x'(A) Q x(A) + 2x l (k)M u(k ) + o' (A) # i/(A)) 
A—/ 


\ 




( 2 . 6 ) 


Note the inclusion of the cross-term weighting matrix M in (2.6). Equations (2.4) 
and (2.5) describe the behavior of the plant at the sample times, and the index 
in (2.6) satisfies: 
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Equation (2.7) does not Imply that J ^ decreases monotonlcally towards as T 

approaches zero. In fact, for systems that are open-loop oscillatory, J ^ will be 

near-infinite if T is an Integer multiple of the period of the oscillation [25]. 

The discrete-time parameters In (2.4M2.6) are defined as follows: 


. t 

*(r)-e /,r ; r«) - / *(r) B d r 

0 


#-*( 7 ) 

r-rcn 

L-C 

T ( 2 . 8 ) 

Q m — f **(r) Q #(t) dr 
T 0 

ft - fl + y / r*(r) Q T(r) dr 

f 

M -~f*'(r)QT(r)dT 
T 0 


The discrete uncorrelated white noise vectors w^(A) and iv 2 (A) have the follow- 
ing covariance matrices: 


0 1 »/*(r)S 1 #*(r)dr 
0 


®2 " t ^2 


(2.9) 


The factor of ■— In the expression for 0 2 arises from the filter preceding the out- 


put sampler in figure 2-1. Such a lowpass filter (of bandwidth —) will be assumed 
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to pass the signal Lx unchanged, while filtering the white measurement noise Wg. 

Due to the fictitious nature of white noise (its unlimited bandwidth), one cannot 
actually sample it unfiltered without obtaining a sample of infinite variance. (Alias- 
ing [28,29,30], which is an overlapping of the spectrum of the sampled signal, 
would cause the Infinite variance.) 

The solution to the discrete-time LQG problem, given in Sage [27], gives 
rise to the following ideal compensator: 

x(A+t)-#x(A) + /c(y(A+1)-£*jr(A)) +Tu(A) 
</(A+1)--Gx(A+1) (2.10) 

where x represents the state estimate, O Is computed off-line as the solution to 
an optimal regulator problem, and K Is computed off-line as the solution to a Kal- 
man filter problem. 

Immediately, a problem arises in trying to Implement the compensator 
described in (2.10). The system shown In figure 2-1 and equations (2.4M2.6) as- 
sumes that the output and Input samplers operate simultaneously. However, equa- 
tions (2.10) clearly show a dependence of u(k+ 1) on y(A+ 1). Since It takes a 
finite amount of time t_ to compute u(A+1) after y(A+ 1) Is present at the sampler 

output, u(A+1) cannot be generated until some time after the (A+1)*^ sample time. 
This contradiction makes it impossible to implement (2.10) as described. 

Such a problem is easy to avoid once recognized. One way to get around 
the contradiction Is simply to delay the clock driving the zeroth-order hold at the 
compensator output by t„ seconds. Leaving all else the same, this approach will 

give approximately the right result whenever t c « 7. However, a more general 
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procedure that will work for any T Z r.. is desirable. 

Kwaksrnaak and Sivan [1 ] have presented such a design method, Including 
the possibility of calculation delay in the initial design. This procedure Involves 
two steps. First, to ensure that the compensator can be physically Implemented, 
we must restrict the control u(k) to depend only on observations up to and includ- 
ing y(A) — - not y(A+1). However, If the calculator time t_ is much less than the 

C 

sample period T, this presents some Inefficiency, since the new value t/(A+1) is 
available (the computations are completed) long before it Is needed as input to 
the hold unit. Thus Kwakernaak and Sivan also allow for a delaying of the clock 
driving the system output (y) sample-and-hold unit by a time 4 relative to the 
clock driving the system input (u) zeroth-order hold ( sample skew, see figure 2-2). 
Thus the plant state Is discretized at times kT and the output at times AT +4, 
although each of these samples will be referred to with 'A’ in the discrete model. 
The terms *x(A)’ and *y(A)’ will no longer represent x and y at the Identical in- 
stant. This fact must be reflected In the discrete-time model equations [1, Sec- 
tion 6.2]. 

The expression for y(AT + 4) can be written using the varlation-of-constants 
formula: 

AT 4*4 

y[Af+4] - Ce A * xtAT] + w 2 [Ar+4] + / e 4(Ar+4-r) / fi dr 

kT v ' 

- Ce A * x [AT] + w 2 [A7 +4] + 

In its discrete-time form: 


fe A &~ r ) dr B tf[AT] + /e 4 ^V 1 [r]di(2.11) 
0 0 
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< T — > Period In which compensator 

calculations must be completed 

Figure 2-2: From [1], page 623 

y(k)~Lx(k) + D u(k) + w 2 (k) (2.12) 

where 

L-C *tf ) 

D - rtf) B 

i 

w z (k ) « w z lkT+6] + f $tf-r) w.j[r] cfr 

0 

Model equation (2.12) must replace (2.6). Two complications have been In- 
troduced: the feedthrough term Du(k), and the nature of the noise w 2 (ft). The 
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noise vectors Wj(A) end Wg(A) hsve become correlated due to the difference 
between the Input and output clock phases. 


E 


w^A) 

w 2 (A) 


[^'(A) w 2 ’(A )] 

i 


© 


11 


« 12 ' 


12 

22j 


A/ 


where: 


(2.13) 


h for A-/ 
*A/ " \o otherwise 


T 

© n -/*(r)S 1 *'(r)dr 
0 

i * 

©22 m f%2 + f #’(r) c/r 

© 12 -/©(r-r)E cfr 

0 


By restricting x(A+1), or equivalently </(A+1), to depend on the observations up to 
and including y(A) only, the optimal compensator equations are also modified: [1] 


x(A+1) - # x(A) + T l/(A) + K (y(A) - L *(A) £> w(A)) 
u(A+1) - - G x(A+1) 


where K Is the (steady-state) new optimal filter gain matrix (nxp) and G is the 
optimal regulator gain matrix (roxn). These matrices sat !< sfy discrete algebraic Ri- 
catti equations that can be derived from [1] for the discretized plant and com- 
pensator described In (2.4),(2.6),(2.8), and (2.12>(2.14): (equation (2.16) Is 
also presented in [26].) 


26. 


Chapter 2: The LOG Problem 


(2.16) 


» M * *-. • -< r r - ■■■■•■• . - ■ ■ *:•* -f ' 

P - (* - r /T 1 rtf) P (* - r g) + Q - rtf /T 1 rtT 

where G - (fl + P P T ) _1 T' P (* - T » _1 rtf) + rt*" 1 rtf 
and 

X - (# - K Z.) 2 #' + 0 1 1 -AC 0 12 ' (2.16) 

where AC - (♦XZ.' + © 12 ) (0 22 + L 2 Z») “ 1 

With this formulation, the compensator (2.14) can be actually implemented 
so long as Qgf&T-t., since the time between the reception of y( A) and the 

v 

generation (sampling) of u(Ac+1) must be long enough (at least t c seconds) to 

complete the computations Involved. Whenever the calculation time Is comparable 
to the sample period, or the sample rate is much greater than the system 
bandwidth, it is advantageous to choose £=0. Such a choice simplifies (2.16) 
since © 12 "0, allows for a simpler hardware clocking arrangement for the samplers, 

and can also reduce the on-line computation time * c since 0-0. For the examples 

treated In this thesis, £ will be assumed to be zero for simplicity. The results 
easily extend to the non-zero £ case. 

In this study, only single-input single-output plants will be considered 
(m-p-l). With this choice, we can naturally build on the existing digital filtering 
results, and still bring out the issues we wish to discuss. Consideration of the 
multiple-input multiple-output case would raise even more issues, and probably ob- 
scure the points we wish to make. Even in digital signal processing, there are 
very few multiple-input multiple-output results. The extension of our results for 
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control systems to the multiple-input multiple-output case would be valuable, and 
In most cases, Is not too difficult. Topics such as multiple-input scaling, multiple- 
output pipelining, and multi-loop limit cycles are discussed In some detail In the 
closing chapter of this thesis. ‘ 


t 
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Chapter 3i Compensator Structures 


$3.1 Introduction 

Chapter 2 has described the background and basic derivation of the LQG 
compensator. The net result was the set of equations (2.14). Since the plant Is 
connected to the compensator at only two points, u and y, the Ideal compensator 
Can be completely described by an input-output map, or transfer function (recall 

that we are concentrating on the sing. a-lnput single-output; case). In terms of the 

1 

parameters In (2.14), this transfer function Is written; 


HU) - - - G (z “* + K L+T G)~'k 


(3.1) 


When expressed as a ratio of polynomials, (3.1) would have the form (3.2) where 
n Is the order of the plant and thus of the LQG compensator. The lack of a term 
Sq In the numerator follows from the dependence of u(k) only on past values of 


y, as explained in Chapter 2. 


MU) 


a^z* -1 + a 2 i" 2 + ••• + a n z~ n 

1 + b-z-* + bo* -2 + + b„z~ n 

1 z n 


(3.2) 


Equation (3.1) or (3.2) represents the Ideal discrete- time response of the LQG 
compensator. Note that Ut these transfer functions, y represents the compensa- 
tor input and u the output, which Is the reverse of the filtering case typically con- 
sidered In digital signal processing. 

Now consider that (3.1) or (3.2) Is to be implemented digitally (as a digital 
network, or filter [31]). Figure 3-1 presents a simple block diagram of this sys- 
tem. The transfer function (3.1) must now be Implemented Infinite precision with 
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Figure 3-1 : Plant And Digital Compensator 


as little degradation In some system performance measure as possible, subject to 
certain constraints on the speed and cost of the attendant hardware. In the set- 
ting of a steady-state LQG problem, it is convenient to select the performance in- 
dex J In (2.6) as the measure of performance, since it reflects the weighted 
steady-state root-mean-square state and control fluctuations. It would also have 
been possible to choose a criterion such as phase margin, output noise power, or 
any combination of stability or noise measures. If the problem under consideration 
were simply a Kalman filter, then a suitable performance measure would be the 
trace of the error covariance matrix. We have chosen J In order to present our 
results In a specific context. These results extend in a simple and direct fashion 
to the error covariance trace, and with more difficulty to phase margin and gain 
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margin measures. 

In this chapter, we will discuss the concept of structures for digital com* 
pensators, and examine accurate ways of representing the arithmetic operations 
Implicit In such structures. Adapting the results of digital signal processing, we 
will develop an accurate notation for compensator structures. Several classes of 
structures will then be presented using this new notation. 

§3.2 Structures and Notation 

As explained In Chapter 1, the term Implementation includes the choice of 
a suitable structure to approximate (3.1) (or (3.2)) assuming fixed*polnt arithmet- 
ic, and the specification of the hardware architecture and components. This sec- 
tion will adapt digital filtering concepts to develop structures for digital compensa- 
tors and to formulate an accurate notation for these structures. The state space 
form common In control applications will be shown to be inadequate for this pur- 
pose. 

The term structure will be employed to specify the exact finite-precision 
mathematical procedure by which the compensator output samples u are generat- 
ed from its input samples y. All structures for implementing a given filter or com- 
pensator would perform Identically under Infinite-precision arithmetic, but will pro- 
duce different quantization noise, coefficient quantization effects, and limit cycles 
given the (realistic) finite-precision environment. 

Consider a very simple example. Assume that an Ideal compensator has 
been designed, and that Its (infinite-precision) transfer function is: 
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(33) 


H(Z) 


.-1 


1 + 1.1 liT 1 +0.28 7z~ 2 


Figure 3-2 a shows a signal How graph [28,29] of one possible structure, the 


a) Direct Form II 



h 1.11 (ideal, oo bits) 
1.109376 (10 bits) 


h To. 287 (ideal) 
D2- [_0.2861 6626 (10 bits) 



Figure 3-2: Example Structures 


direct form II [28], for implementing (3.3). The infinite-precision values for b ^ 

and &2 can be read directly from (3.3). Given only 10-bit coefficient registers, 

these values must be quantized (assume rounding). Reserving one bit for the in- 
tegral portion of the coefficient word (bits to the left of the binary point), one 
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sign bit, and 8 bits for the fractional portion, the rounded coefficient values would 
ba 1.109376 and 0.28616625. 

Figure 3-2b shows the flow graph of another common structure, the cas- 
cade form. Here we realize (3.2) With a cascade of two first-order filter sections. 
The coefficients a^ and a 2 can be found by factoring the denominator of (3.2). 

Again, the ideal values must be rounded to fit 10-blt words, producing a 1 ■ 

0.69921876 and a 2 * 0.4106625. 

Now let us examine the performance of these two structures given their 
respective finite-precision coefficients. The (10-blt) direct form II and the cas- 
cade have the transfer functions shown In (3.4) and (3.6) respectively: 

H(z) (3.4) 

1 + 1.109375Z" 1 + 0.2861 6626* 

H(z) — (3.6) 

1 + 1.109376Z -1 + 0.2867889404296876* ~ 2 

Clearly these two structures produce slightly different transfer functions under 
finite precision, and we have not even considered their respective quantization 
noise and limit cycle behavior. Thus different structures will In general result in 
different finite-precision performance, even though their Infinite-precision counter- 
parts have equivalent performance (that of the ideal design). 

In order to discuss or analyze different implementation structures, one must 
have a notation (other than the pictorial signal flow graph) that accurately 
reflects these differences. From -the system theoretic approach, It seems natural 
to examine the discrete-time state space representation for a digital filter (with 
Input u and output y): 
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(3.8) 


‘'A+l "*11 i 'A + *12‘ / A 

y k “*21 v k + *22 u k 

In this representation, the states v are defined to be the outputs of the delay 
elements in a signal flow graph, and the multiplier coefficients in ^21' 

and ¥ 2 2 are the gains between state or Input nodes and next-state or output 

nodes. 

Unfortunately, while this form of notation does accurately represent a class 
of structures, it Is not sufficiently general to represent the arithmetic operations 
associated with any structure. This lack of generality arises in representing 
structures whose signal flow graphs must have intermediate nodes, that Is, nodes 
which are not state nodes or the Input or output node. Figure 3-3 presents such a 



structure, a two-pole two-zero direct form II structure. Nodes #C and #D are 
state nodes, node #A Is the input node, and node #E Is the output node. Howev- 
er, the Sq branch begins at an intermediate node, node #B. Thus there would be 
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no way to include the coefficient a Q as an entry in any of the state space ma- 
trices 2 ’ ^21’ or ^22' From another viewpoint, the state space 

representation lacks any way of expressing the implicit ordering, or precedence, 
associated with the operations Involved In certain filter structures. For state 
space representations, all multiplications can occur at once (Independently), and 
then all additions can occur. For the direct form II structure of figure 3-3, the 
multiplications ~b^, and -bg, must precede the addition at node #B which must 

then precede the following multiplication by Sq. This sequence of operations can- 
not be adequately expressed by equations of the form (3.6). This point is clearly 
illustrated by Willsky [16], pages 122-124. 

At this point it is convenient to turn to the field of digital signal processing 
for an adequate way to represent filter structures. Crochiere [31,32] has 
described matrix equations for correctly computing the node signal values In any 

filter structure. Let the signal value at the node (of Nq nodes) at time k be 

Yjik) and the external input to node / be u f (k). Between any two nodes / and / 

there can exist one Interconnecting branch of constant gain F dJ' and/or one 

multiply-and-delay branch F^.j . These branches and their Interconnected nodes 

form an elementary network. (We have further assumed that all values F^j are 

either zero or one, with no loss of generality.) For an elementary network then, 
the node value y^(k) may In general depend on all node values at time f.-l and 

some of the node values at time k, depending on whatever branches exist: 
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(3.7) 


N o N a 

y f (k)~ zf r cji y j (k)+ j Uy,(M) + u ( (*) 

y-i y-i 

Thus F c is an AAqXAAq matrix of constant-branch coefficients, and F^ is an 
N 0 *N 0 matrix of delay-branch coefficients. In most networks, a substantial 
number of the entries In F c and F^ are zero, and as stated above, the remaining 
entries In F^ are ones. In z-transform notation, the vector quantity Y{z) can be 
written: 


Y GO - U(z) + F c * V(z) + Y (z) z” 1 (3.8) 

The transfer function matrix W(z ) defined by V(z) - Af(z) f/(z) can be derived 
from (3.8): 


H(z) - [/ - F c - F d z _1 ] _1 (3.9) 

Now lets take a look at computing the node signal values. These calcula- 
tions must occur between the time instants Ac-1 and Ac. Some of the node up- 
dates will Involve the past values at time Ac-1, and some will involve already- 
updated values. Thus the node values must be computed in the proper order. For 
example, the first node value to be updated should not depend on any other up- 
dated node values, since these would not yet have been computed. Thus in 
terms of the matrix notation above, a correct node precedence, or ordering, would 
only depend on the constant-coefficient branches, since all delayed values y(Ac-1) 
are known at time Ac. Crochiere [32] describes a formal node-ordering technique: 

(1) All nodes entered by Inputs or delay branches only are placed in node 
class 1. 
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(2) Remove from the network ell class 1 nodes and any branches connected to 
them. 

(3) Repeat steps 1 and 2 on the remaining network, for node classes 2, 3, 

• • ■ until all nodes are classified. 

(4) Order from 1 to Nq all nodes, first using all the class 1 nodes, then class 
2 and so on. 

t 

This technique will not result In a unique ordering of the nodes, but the ordering 
produced will satisfy the above-mentioned computational constraints. 

If this ordering procedure can be carried out, the digital network, or struc- 
ture, is computable, and the resulting F' matrix is zero on and above the main dl- 

agonal. If not, the network had at least one closed loop without delay, and does 
not represent an Implementable structure. Note that a non-recursive structure 
has an ordering whereby F c ' Is also zero on and above the main diagonal. 

As an example of this matrix signal-flow-graph formulation, consider the 
five-node structure of figure 3-3. Using the ordering algorithm presented above, 
nodes #C and #D fall into class 1, node #A falls into class 2, node #B Into class 
3, and node #E into class 4. Thus we can define nodes #1 through #6 with the 
ordering C,Dfl,B,E. The following five equations now define the (frequency) 
response of the network: 
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y i “ 

V 2 

Y a ~- b 1 Y 1 - b 2 Y 2 

y 4 " y 3 

Y 6 "Vi +a 2 Y 2 

The 6x6 matrices F Q and can be formed 
resulting matrix W(z ) Is given in (3.11): 



+U. 


+a 0 y 4 


using (3.8) and (3.10), and the 


Hiz) m 


1 -z" 1 i> 1 0 -a-j 

0 1 b z 0 -a 2 

0 01-10 

-z” 1 0 0 1 -a Q 

0 0 0 0 1 - 


(3.11) 


For a single-input single-output digital filter such as the one in figure 3-3, 
we specify only the scalar input-output map H^j(z). (ftgg(z) in the example 

above) The remaining entries of H(z) represent transfer functions from or to 
nodes that are internal to the structure. 

A deficiency of the matrix notation above appears when we consider struc- 
tural transformations. Such transformations are very useful In generating new 
structures with identical infinite-precision transfer functions as some original struc- 
ture, but with different finite-precision performance. For a structure which can be 
accurately represented with state space notation, the similarity transform fills this 
role. For the Crochlere matrix representation, a transformation technique also ex- 
ists [1 7]. This technique must be constrained so that the transformed structure 
Is computable*. In other words, it must have no delay-free loops [17]. However, 
even with this restriction, the number of delay branches and the degree of pre- 
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cadence inherent in the additions and multiplications of the resulting (Infinite* 
precision-equivalent) structure are in general unpredictable. 

To combat this difficulty, a notation as convenient and useful for transfor- 
mation as the state space form, but with the generality of the Crochiere matrix 
representation is desirable. Such a notation, related to the state space notation, 
has been presented by Chan [17]. As In a state space, define the outputs of de- 
lay elements to be the states v, and let y be the filter or compensator Input and u 
be the output. Then the coefficients and the sequence of multiplications and addi- 
tions in any filter structure can be specified with the following representation: 


v(rt+1) 
u(A) . 


q q - 1 



(3.12) 


Where ■ • ■ , are matrices representing the arithmetic and quantization 

operations In the structure. Three important points make (3.12) useful: 

(1) Each (rounded) coefficient in the structure occurs once and only once as 
an entry In one of the matrices. The remainder of the matrix entries 

are ones and zeros. 

(2) All intermediate (non-storage) nodes In a structure are represented In the 

[ ir(A)1 

y(A)]’ r 2 {k)^^f 2 r A (k\ ..., r q _^(.k) -* q _^r q _ 2 (k). 

This point is especially important since both the state nodes v and inters 
mediate nodes r must be scaled to satisfy dynamic range constraints. (See 
Chapter 5). 


(3) The concept of precedence for the operations (multiplies, adds, and quanti- 
zations) Is maintained. The ordering of the matrices* implies that the 
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operations Involved in computing r ^(k) are completed first, then r 2 (A) 


next, and so forth. Thus the matrix contains the operations of lowest 

precedence, and the parameter q specifies the number of precedence lev- 
els. 

Consider the example of figure 3-2. Using the procedure outlined in Chan 
[17], the direct form II structure In figure 3-2 has a one-level representation as 
shown in (3.13), while the cascade structure of figure 3-2 requires two levels to 
describe Its operations (3.14). 



(3.13) 


ir(A+1) 

u(k) 



fi o o' 


0 11 


v(k)‘ 

m 

0 

CM 

« 

1 


o 

o 


y(*). 


O 

O 

1 



(3.14) 


It should be noted here that the representations shown In (3.13) and (3.14) are 
not unique. The numbering of the Intermediate nodes r (within precedence con- 
straints) is arbitrary. (This nonuniqueness is also true of the Crochiere matrix 
representation, since node numbering within a class Is arbitrary,.) Furthermore, 
some of the r nodes are trivial, as can be seen by reversing the procedure and 
generating a structure directly from (3.14) — see figure 8-4. Nodes r j 1 (Ji ) and 

v^(A+1) are equivalent nodes, separated only by a trivial inuStipileatlen by one In 

^ 2 * r be same is true of r.j 2 (Ac) and v z (k). Figure 3-2b is simply a node-minimal 

version of figure 3-4 [17,32]. However, all such representations are equivalent In 
terms of their Unite- precision behavior — they all effectively represent the same 
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Figure 3-4; Exact Structure of (3.14) 


structure [1 7,32]. It does not matter which is chosen. 

In terms of its generality, the notation described by Chan is as useful as 
the Crochiere representation. In fact, Chan presents a technique for converting 
from any (eiemcintary) signal flow graph to his state space-related notation, and 
then back again to an equivalent signal flow graph. Also, In the context of Chan's 
notation, we can now see that a state space will represent only a class of struc- 
tures, namely those with only one inherent level of precedence. 

An important advantage to the notation introduced by Chan is the ease 
with which transformations [17] can be applied to generate new structures that 
are inflnite-preclsion-equlvalent to some original structure. This technique Is an 
adaption of the similarity transformation used with a (one-level) state space. 
Define: 

V, - P f V, Pf\ for / - 1 , ■ • • ,q (3.1 6) 

Where the P f for / - 1, • • ■ , qr-1 are general non singular transformation matrices 
of appropriate dimension and 
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1 




(3.16) 


The new (transformed) structure will then have the following representation: 


v(A+1) 

ii(A) 


■*,Vi [/(*>] 


(3.17) 


What makes this transformation method so useful is that the original and 
transformed structures have the same number of states (delays) and the same 
number of precedence levels. It is also possible to restrict the matrices 

• • • , Pq j to control the number of non-unity, non-zero entries In the 

new $ matrices, as explained in Chapter 8. 

Now let us try to apply this valuable notation to represent structures for 
digital feedback compensators. Unfortunately, the notation described by Chan is 
not quite adequate for the control setting. To demonstrate this point let us con- 
sider the direct form n structure in figure 3-3 as a compensator structure. In the 
notation of Chan, this structure will have the following representation: 


[i/(A+1) 


' 0 

1 

0 


lu(A) . 

m 

0 

0 

1 




a 2 

a 1 

a 0. 



1 

0 

-b 


0 0 
1 0 

2 “ 6 1 1 


v(A) 

y(A) 


(3.18) 


According to this set of equations, the next-state vector v(A+1) Is a function of 
the present state and Input (no problem). However, (3.16) also describes the 
current output to be a function of the current state and input. From the viewpoint 
of causality this expression must be in error, since some finite amount of time is 
needed for the computation of u(A) after v(A) and y(A) .are generated. In most 
digital filtering applications, a short delay in obtaining the output (series delay) Is 
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of no concern, and hence the representation in (3.18) Is adequate for filters. 
However, in control applications, such delay is critical since the filter is embedded 
In a feedback loop (recall Chapter 2). . Our approach must reflect the true opera- 
tion of the compensator, accounting for all necessary computational delays. 

One simple approach to solving this problem might be to include the delay 
•a an explicit series delay following the compensator described In Chan's nota- 
tion. Unfortunately, this implies that the delay be included as part of the control 
system plant. Thus every LQG design would Involve initially augmenting the plant, 

and then designing an optimal LQG compensator. For an /reorder system, this 

procedure creates an (n+1)*^-order augmented system, and thus an (n+1)*^-order 
compensator. Clearly this approach has a disadvantage; it Increases the com- 
pensator order. Furthermore, by not Including the extra delay In some way within 
the compensator itself, we may have restricted the types of structures that are 
possible for compensator implementations. 

Thus we must search for a better approach. Let us Include the extra de- 
lay within the compensator structure itself. The compensator design technique of 
Chapter 2 ensures that the output u(k) depends on past Inputs, not present In- 
puts. (This seems to force the compensator to include an entire samp/e delay 
time T Instead of . simply a calculation time delay, which may be considerably short- 
er. However, recall the sample-skew issue discussed in Chapter 2.) Thus we can 
represent i/(A+1) as a function of v(k) and y(A), rather than as a function of 
v(ft+1) and y(A+1). Then u(k) can be generated by a unit delay following t/(k+1). 
The node u(k) thus becomes an additional compensator state. In terms of adapt- 
ing the notation of Chan, let us choose u(k) to be the last state (numerically), 
• and write: 
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(3.19) 


v(A+1) 

,u(A+l). 




V<#c) 

u(k) 

y(*) 


where the vector v and also the scalar u are the states of the structure (outputs 
of delay elements). Thus we have slightly altered the notion of a structure for 
compensators. Unlike filter structures, u(A) is always both an output and a state. 
The notation In (3.19) for describing compensator structures will be called its 
modified state space representation. 

The major Implication of this adaptation is that n^-order compensators will 
now require structures having n+1 unit delay elements, rather than n as with digi- 
tal filters. In addition, certain common digital filter structures (for example, the 
direct form II and cascade and parallel structures based on it) will no longer ap- 
pear quite the same when used for digital compensators. Each will have an extra 
delay at the output node, as compared to the corresponding filter structure. In 
terms of their modified state space representations, the matrix for such struc- 
tures will have an ail-zero *)ext-to-last column. This must occur whenever the 
node u(k) does not feedback to the rest of the structure. Section 3.3 will show 
examples of such structures, and we will still refer to them by their corresponding 
digital filtering designations — see figures 3-5, 3-6, and 3-8). For the remainder 
of this thesis, the modified state space of (3.19) will be employed to describe 
compensator structures, and all signal flow graphs will reflect the delay (state) 
necessary for u(k). 

One final implication of the adapted concept of a structure should be 
brought out. In terms of the transformation procedure described in (3.16) and 
(3.16), a change is necessary to accommodate compensator structures. In 
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(3.16), due to the Inclusion of the output as a state, the transformation matrix Pq 


must now be written 


P 0 0 
0 10 . 
0 0 1 . 


The extra row and column in this matrix reflect 


the modified state space representation, and the unity diagonal entry Is neces- 
sary since the transformation procedure cannot be permitted to alter the output 
node. 

It Is also notationally convenient to define the matrix ¥ w . Let the 
coefficients in each matrix be replaced by their Infinite-precision counterparts 
(their values before rounding). Then Is defined to be the infinite-precision pro- 
duct ♦ ♦ ^ This matrix will be used in the derivations of Chapters 6, 

6, and 8. 


§3.3 Classes of Structures 

Before discussing some of the various classes of structures that exist, it 
Is Important to understand the different points of comparison that should be con- 
sidered. Beyond the finite wordlength effects of quantization noise, coefficient 
rounding, and limit cycles that are treated in Chapters 6, 6, and 7, one must com- 
pare the number of delay elements, coefficients (multiplications), additions, and 
precedence levels, and also the number of scalers needed to satisfy dynamic 
range constraints. We will examine structures that are typically canonic (minimal) 
with respect to the number of delay elements, implying a minimal number of 
storage registers. In order to present specific examples of structures, let us as- 
sume that the plant IS sixth order (n*=6). 

Given the transfer function (3.2), the most straightforward structure to 


Section 3.3: Classes of Structures 


46. 


consider Is the direct form n [28]. as an LQG compensator structure, its signal 
How graph Is shown in figure 3*6. It is canonic In delays with 7 (In general, n+1), 
has 12 coefficients (non-unity multipliers) and requires only one additional scaler . 
(Scaling, fully discussed In Chapter 6, involves a normalization of the structure so 
that roundoff noise effects and overflows can be held to a minimum. In this pro- 
cess, some of a structure’s coefficients will be altered, including certain unity en- 
tries. Such unity entries will be called scaling multipliers, or scalers, and indicat- 
ed in signal flow graphs and equations with an asterisk.) The modified state 
space representation of the direct form II is given below with its two precedence 
levels. Note that figure 3-6 Includes a rough indication of which operations belong 
In which precedence level. 



(3.20) 
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Figure 3-5: Direct Form n Structure (sixth order) 

The coefficients in this structure (before scaling) are read directly from the 
transfer function (3.2). 

For higher-order filters, the direct form structure is known to perform poorly 
In terms of the degradation resulting from the use of finite wordlengths [33]. The 
dynamic range of the coefficients alone grows with filter order, when the poles 
are clustered In the /-plane. (As shown In Chapters 5 and 6, this will be true for 
the direct form II compensator structure also.) Consequently, factored structures, 
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* UOh " * he W — eecond-order hlter are cemmeniy 

-«l. This structure I. obtained hem . multWosthre fsctorln, of the taster 
function (3.2): 


«(*)- 


(d 1 s~ , wi g s-Z)(i.d aZ -1. tf4r -a )(Utf62 -i 4tf ^-2 ) 

(1te 1 z- 1 +e 2Z -2) (1 *c aZ -1tc 4 z- 2 )(l.o 6 s-1.c B z- 2 ) 


(3.21) 


If eeoh second-order section I. Implemented «s « direct form II structure, then 
the cssc.de compensstcr etructure Moure 3-6) else hss 12 coelUclents snd 7 



delsys (csnonlc), bu, requires four precedence levels ( v , 0 e„e,.l, where n, 

Is the number of sections) and three scalers: 
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Actually, this cascade can be used to represent several different structures since 
the poles and zeros In (3.21) must first be grouped together to form second-order 
sections, and then the sections must be ordered. Furthermore, the individual sec- 
tions could be structured In any number of ways (other than the direct form A) 
[34], [36], each giving rise to a different overall structure. 

This variety of second-order sections raises an Interesting point. If a cas- 
cade or parallel combination of a certain type of section is not delay-canonic 
when applied to digital filters, It may still be delay-canonic when adapted as a 
compensator structure. Consider the case of a cascade of direct form I [28] 
second-order sections. Such a filter structure Is not delay-canonic (it requires 
more than n delays). However, due to the added delay used in compensator 
structures, the direct form I compensator structure Is delay-canonic, requiring n + 1 
unit delay elements. For a sixth-order LQG compensator, such a structure has 7 

delay elements and only three (In general n) precedence levels and two scalers: 

£ 

(See figure 3-7) 
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Another factored form is the parallel structure. This structure is obtained 
from a partial-fraction expansion of (3.2): 
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(3.24) 


//(*)- 


ej^+Og *- 2 

l+C-jZ-^+Cg*- 2 


e 3 z~Ue 4 z- 2 

1+C gZ^+C^Z” 2 


e 5 z “ 1 * e 6 e ~ 2 

1+CgZ"'+CgZ“2 


Again, using the direct form II for each individual section results in the compensa- 
tor structure of figure 3-8, which has two precedence levels, 12 coefficients, 7 
delays (canonic), and three pealing multipliers; 
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(3.25) 
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0 

1 

’Cc 


0 

0 

0 

0 

0 

0 


The representation in (3.25) can be used to represent several structures since 
the real poles, if any, must still be grouped into sections. (The section-ordering 
and zero-pairing Issues of the cascade disappear since all sections are In parallel, 
and the partial-fraction expansion gives no control over the zero locations.) Also, 
different types of second-order section structures are possible. 

A structure that appears on the surface to be more natural for the LQG 
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where 1 0 represents a 6X6 identity matrix. In general this structure (termed the 

simple form) has three precedence levels, Is canonic In delays, and has up to 
n(n+ 4) coefficients, depending on the entries In $, T> L, and G. For a sixth- 
order LQG system, this structure would have up to 60 coefficients. This number 
of multiplies is quite excessive, compared to any commonly-used filter structure. 
However, this compensator structure (or the similar structure based on the of 

the simple form) is often used for steady-state LQG control applications, more or 
less by default. 

Another broad class of structures includes all the structures whose 
modified state space representations have just one precedence level matrix. 
These structures could be called state space structures, since the arithmetic and 
quantization operations Involved can be described using state space notation. 
Some of these can be generated from the direct form II, cascade, parallel, and 
simple forms just by multiplying the various matrices together to produce '9 n , 

and using the result as a structure. The standard observable, standard controll- 
able, and Jordan forms [36] well-known to the control and estimation field also 
correspond to simple one-level structures [15,80]. One could envision such struc- 
tures being useful for two reasons. First, their performance may be superior to 
certain multiple-level structures, whether or not they have more coefficients. 
Secondly, a one-precedence-level structure allows a faster system sampling rate 


Section 3.3: Classes of Structures 


66. 



than a multiple-level structure (see Chapter 4), and thus potentially better perfor- 
mance. An Interesting type of one-level filter structure Is the minimum roundoff 

noise structure of Mullls and Roberts [18,37,38], and Hwang [39], Given no con- 

j 

attaints on the coefficients of a one-level delay-canonic filter structure, they have 
derived a technique for computing the coefficient values producing minimum roun- 
doff noise at the filter output. Unfortunately, this filter structure requires (n+1)** 
coefficients. To avoid this problem, the authors have also presented block optimal 
filter structures, which are cascade or parallel forms composed of minimum noise 
second-order sections. (See also Jacks'’ Llndgren, and Kim [40]). For a block 
optimal structure, only 4n+1 coefficients are required. One of the efforts of 
Chapter 6 will be to extend the Ideas of Mullls and Roberts to derive minimum 
roundoff noise compensator structures. 

Using f ^ , * * • , f as the coefficients, a sixth-order block optimal parallel 

compensator structure would have the following modified state space representa- 
tions 
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Note that the pole-zero pairing issue must still be addressed, as with any parallel 
form. No additional scaling multipliers are required In (3.27). As with any cas- 
cade, a block optimal cascade compensator structure would have the disadvan- 
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tage of having multiple precedence levels; /»_ In this case. (Recall that the 

s 

parallel block optimal structure requires only one precedence level.) 

Besides the direct form and general state space forms, there exist other 
Alter structures not derived from a factorization of the transfer function (3.2). 
Gray and Markel [41] have presented several ladder and lattice forms that are 
delay-canonic. Another set of ladder filters [42], afc<> delay-canonic, result from 
continued-fraction expansions of (3.2). A ladder structure that has received a 
great deal of attention In the filtering literature Is the wave digital filter 
[43,44,46]. This filter structure is based on analog LC ladder filters, and directly 
results from a consideration of the transmlsslon-llne equations of microwave filters. 
Line delay and the transmitted and reflected voltage waves become the sample 
delay T and the signal variables of the wave digital filter. Characteristics of this 
structure that derive from the passivity and losslessness of its analog counter- 
part [40], and lead to the absence of limit cycles under specific sign-magnitude 
truncation arithmetic. (See Chapter 6). The coefficient sensitivity of this struc- 
ture has been shown to be comparatively low [44], and under certain additional 
constraints [48] It will also be low-noise. Additional Improvements have been In- 
troduced to reduce the number of multiplies [49] and the number of delays [60]. 
MeerkOtter and Wegener [61] have developed a second-order wave digital filter 
section which can be the building block of a cascade or parallel form. This sec- 
tion would have four multiplies and two sign-magnitude truncation quantizers, but 
require five additional scalers (as opposed to the one or two scalers of most sec- 
tions). As with many of the digital filter structures, ladder- type structures could 
easily be adapted for compensator structures by adding a series delay to the 
filter structure output. 
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Finally, a general class of optimal structures exists. Chan [17] has 
described a technique for Alters where, through the use of the transformations in 
(3.16) and (3.16), a scalar function of the structure parameters can be minimized. 
More Importantly, the method will hold almost any set of ¥y entries constant, as 

desired. Thus we can control the number of coefficients In the structure and their 
locations while minimizing roundoff noise or coefficient quantization effects, or some 
combination of the. two. Chapter 8 will adapt this useful technique for the optimi- 
zation of compensator structures, and an example of the constrained minimization 
of compensator roundoff noise effects will be presented. 

This discussion of compensator structures was not Intended to present an 
exhaustive list of possible structures, but only a representative selection. (For 
example, transpose configurations [30,31] were not considered.) The analyses in 
Chapters 6, 6, and 7 compare some of these compensator structures with respect 
to their finite wordlength properties. The overall aim Is to provide the reader with 
a basic grasp of the various structures and of the different criteria for choosing 

4 

among the different classes of structures, given control and estimation applica- 
tions. 

$3.4 Summary 

Beyond a presentation of the more common types of compensator struc- 
tures, the main point of this chapter was the Introduction of the modified state 
space representation. This representation exactly reflects the computations that 
determine the performance of a compensator structure when Implemented with 
finite wordlengths, and also the order In which these computations must occur. 
This representation, unlike the form Introduced by Chan [17] which is adequate 
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for digital filters, must include all the inherent delays necessary to complete the 
operations within the compensator structure. Finally, as with the Chan form, It Is 
possible to apply simple transformations to this representation In order to syn- 
thesize a compensator structure with superior finite wordlength performance. 
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§4.1 Introduction 

In this chapter, we will examine the architectural Issues Involved In the Im- 
plementation of digital feedback compensators. We will show that the basic con- 
cepts of serlallsm and parallelism as they apply to digital filter structures 
represented In Chan’s notation extend without modification to digital compensator 
structures represented in the modified state space notation. However, the same 
cannot be said concerning the application of pipelining techniques to compensa- 
tors. In fact, we will show that pipelining In control systems brings out another 
Important issue: the interaction between the Ideal design procedure described In 
Chapter 2 and the implementation of the resulting compensator. 

Perhaps the most basic issue in any consideration of digital system archi- 
tecture Involves the concepts of serialism and parallelism [31,52,53]. Essentially, 
this notion Involves the degree to which processes, or operations, in the system 
run In sequence (serially) and the degree to which they execute concurrently (in 
parallel). At one extreme, any system can be Implemented with a completely seri- 
al architecture, executing ail Its processes one at a time. This procedure re- 
quires the minimum number of actual hardware modules and the maximum amount 
of processing time for completion of the system task. On the other hand, any 
system can also be implemented with a maximally-parafle! architecture, having as 
many concurrent processes as possible. Such a design requires the maximal 
amount of hardware, but completes the overall system task in minimum time. Thus, 
the serlalism/parallellsm tradeoff Is another example of the frequently encoun- 
tered space-time tradeoff [52]. 
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There is an important asymmetry implicit in the exploitation of seriailsm and 
parallelism. It Is always possible to execute processes one at a time (totally 
serially). However it is not always possible to execute them all at once (In a to- 
tally parallel manner). There is a minimum amount of seriailsm required. Figure 4-1 
gives a typical example, consisting of three processes (PI, P2, and P3), and 



Figure 4*1; Three*Process System 


data cells [52] for input and output. Assume that each of the three processes 
require t seconds for completion (given specific hardware modules) and that each 
process executes as soon as all of its Inputs are valid. Given a general-purpose 
computing module, then clearly a serial architecture that would require 3 1 seconds 
to complete the overall task is possible. On the other hand, figure 4*1 clearly 
shows that processes PI and P2 must be finished before process P3 can begin. 
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Consequently, only processes P 1 and P 2 can operate in parallel. For such an ar- 
chitecture, two hardware modules would be required, and the total computation 
time would be reduced to 2t seconds. The totally-parallel architecture (total time 
t with three hardware modules) is not possible for the system of figure 4-1. 

Under certain conditions, this ’speed barrier' can be broken through the use 
of pipelining [31,62]. If the original objective of the system is to perform a task 
repeatedly (as soon as the present task Is completed, a new task begins), then 
pipelining could realize an effective throughput rate equal to (or at least closer 
to) that of a totally-parallel architecture. Reconsider figure 4-1. Suppose that a 

separate hardware module Is reserved for each process, the sampling rate is 

and the maxlmally-parallel 2t second architecture is used. The input and output 

data cells now represent registers clocked at rate Let us examine any 2t- 

second Interval. During the first t seconds, module 3 (for executing process P 3) 
will be Idle, since Its inputs are not yet valid. During the last t seconds, module 3 
will be active and modules 1 and 2 will be idle. The total 2t second time from a 
task Initiation until its completion cannot be reduced without faster hardware 
modules. However, the idle modules can be put to use by pipelining the 
processes. While module 3 is active and modules 1 and 2 otherwise idle, the 
next task may as well begin and use modules 1 and 2. The net result (in this ex- 
ample) Is a doubling of the throughput rate (task completions per' second) from ~ 

to . It must be stressed here that any given task still takes 2t seconds from 

start to finish; however, successive task completions occur at t second intervals. 
In terms of hardware required, the pipeline would be effected by adding two 
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Figure 4-2: Models For Pipelining 


pipelined case for this example. Basically, the pipeline splits a larger task not im- 
plementable In a totally-parallei architecture Into smaller sequential sub-tasks, 
each of which can be Implemented in a totally parallel fashion (figure 4-2a). An 
equivalent viewpoint (figure 4-2b) considers pipelining to be represented by a 
faster-executing task coupled with some serious delay (inherent in the additional 
clocked registers). 

An important application of pipelining is in the Implementation of digital filter 
structures [31,64]. In such a case, the system task corresponds to the genera- 
tion of a Altered output value from an Input sample, and the Individual processes 
correspond to the hardware digital multiplications and additions that exist In tf»e 
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particular structure Implemented (ignore A/D and D/A operations for now). Figure 
4-3a shows a two-pole digital Alter with input y and output u. As shown, the unit 



u(k) 



u(k) 


Figure 4-3: Pipelining a Simple Digital Filter 


delay z ~ 1 represents a clocked storage register. Thus, all the arithmetic and 
quantization operations have one sampling period in which to be completed. Com- 
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putlng the signal u(k+ 1) at node A In figure 4-3a requires three multiplications and 
an addition. The multiplications involving b ^ and b^ can operate In parallel, then 

the addition occurs, and finally the multiplication by a ^ . Using three hardware mul- 
tipliers Instead of two, and assuming negligible add time, the multiply operations 
can be pipelined and the sampling rate doubled. The new configuration could be 
Implemented with Just one additional storage register, represented in figure 4-3b 
as an additional unit delay. However, this new signal flow graph Is not node 
minimal, since it contains two states that are exactly equivalent. Removal of one 
of these states produces the node-minimal signal flow graph shown In figure 4-3c. 
Thus, the pipelined structure of figure 4-3c has the same number of unit delays 
(storage registers) as the original structure In figure 4-3a. For this particular ex- 
ample, pipelining did not require the use of more unit delays. This would not be 

true in general. Note that each z -1 in figures 4-3b and 4-3c represents only half 
the delay time of those In figure 4-3a if the sampling rate Is doubled, as made 
possible by pipelining. 

From the example of figure 4-3, it is clear that pipelining ties in closely 
with the digital filter notion of precedence. Specifically, let us consider nod® pre- 
cedence, that is, the precedence relations Involved In the addition, multiplication, 
and quantization operations needed to compute the node signals. In this case, 
the modified state space representation (See Chapter 3) is very convenient since 
It explicitly shows the number of precedence levels Involved. If a structure 
represented in this notation has only one precedence level, then It can have a 
totally-parallel architecture (parallel In terms of the multlply/add computations In- 
volved In each precedence level). If more than one such level Is required, no 
totally-parallel architecture Is possible, and the number of levels q will equal the 
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minimum degree of serlallsm required. Pipelining, if applicable, would actually 
change the structure by Inserting unit delays so that a new structure (one with 
fewer levels and thus a faster sample clock rate) Is formed. The pipelined struc- 
ture would have the same transfer function as the original non-plpellned structure, 
except for some series delay, and would probably have more state nodes. Series 
delay Is of little consequence In most digital filtering applications. Thus a two- 

level structure can be designed for a sampling period of ~~ even though the cal- 

z 

culatlons require t seconds, since pipelining (given a two-level structure) will fit 

t t 

the calculations Into a ~ slot at the expense only of a series delay of ~ 

seconds. Equations (4.1) through (4.4) show the modified state space represen- 
tations and transfer functions of the non-plpellned (sampling period t) and pipe- 
lined (sampling period ■— ) filters of figure 4-3a and 4-3c respectively: 
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Note the reduction from two levels to one level (see (4.1) and (4.3)), allowing the 

doubled sampling rate, and also the extra z~ factor in the numerator of (4.4). 
The number of states In (4.3) remained at three since no additional storage regis- 
ters were actually added to effect the pipeline. 

Let us now consider pipelining as it applies Just to the multiply operations 
In a structure. Such a consideration will be valuable whenever the multiply time 
dominates over all the addition and quantization operation times In a structure, a 
situation that is not uncommon in microprocessor-based digital systems. Since we 

are neglecting all calculation times other than the multiply times, If Is sufficient to 

/ 

<• 

know the precedence to the multiply operations alone in order to determine the 
architectures that are possible. Thus the node precedence evident from the 
different matrices of a modified state space representation will not be ade- 
quate* to describe the multiplier precedence relations. Such relations can be 
determined from the signal flow graph or from an examination of the specific loca- 
tion of each multiplier coefficient in the "9^ matrices. In either case, the multi- 
pliers. can be grouped into precedence classes. Frequently, the number of multi- 
plier precedence classes and node precedence levels will be the same, but the 

multiplier coefficients In class 1 (of highest multiplier precedence) and the multi- 

% 

pller coefficients in node precedence level 1 (the matrix ¥.|) need not be identi- 
cal. It will be true that all the multiplier coefficients in the matrix will also be 

In multiplier precedence class 1. Furthermore, multiple-level. structures often have 
fewer multiplier classes than node precedence levels. 
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As an example, consider the cascade structure of figure 3-8 and Its 
modified state space representation (3.22). Assume all scaling multipliers to be 
simple shifts (powers of two); thus they are not considered to be true 
coefficients requiring hardware multipliers. All the multiplications of coefficients by 
state node or input signals can occur Immediately after each sampling instant and 
therefore fall In multiplier precedence class 1. Thus the Cj, Cg, Cg, c^, Cg, Cg, 

dgi 0g* d g» end d Q multiplies can operate In parallel given enough hardware 

multiplier modules. Only the multiplication lies in class 2; it must await the 

completion of the and Cg multiplies. Of course, given the two classes and 12 

multiplies, an optimal, that is maximal, use of the hardware Is made with only 6 
hardware multipliers (assuming no pipelining). Five of the class 1 multiplies (but 
not c 1 or Cg) would be computed in the second multiply cycle with the d 1 multi- 
ply. Thus the cascade of figure 3-6 has two multiplier precedence classes, 
although It has four node precedence levels. Similarly, the cascade structure In 
figure 3-7 has only one multiplier precedence class assuming power-of-two 
scalers, although its modified state space representation (3.23) shows three node 
precedence levels. If in fact general scalers are used in these two cascades, 
they will constitute multiplier coefficients, and the number of multiplier precedence 
classes and node precedence levels will be the same. No matter what type of 
scalers are used, the parallel structure of figure 3-8 has the same number of mul- 
tiplier classes as It has node precedence levels; even so, the coefficients of mul- 
tiplier class 1 (c^, Cg, Cg, c 4 , Cg, Cg, e 2 , e^, and e 0 ) are not simply the 

coefficients In The coefficients e^, e Q , and e fi belong to multiplier class 2 
because they must await the completion of the c 1 through c 0 multiplies. This no- 
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tlon of {multiplier precedence is more completely formulated in [31 3, but the basic 
conclusion is as fallows: although the modified state space representation 

correctly describes the operations that must occur in computing the node valuls 
within a structure and has other useful properties (see Chapter 3), the multiplier 
precedence relations (more easily seen directly from the signal flow graph) are 
more significant for determining the possible hardware architectures when the mul- 
tiply time Is dominant. 

4 

$4.2 Restrictions on Pipelining 

Certain basic restrictions [31] must be observed when pipelining a complex 
structure. The first limitation in applying pipelining concerns parallel data paths 
within the structure. Whenever any portion of a system Is pipelined to increase 
the sampling rate (which adds effective delay), all parts of the system that feed- 
forward In parallel with the pipelined portion must receive equivalent actual delay 
* ' 

in order to maintain the desired transfer function. In other words, the data flow- 
ing through the system must remain synchronized whether or not pipelining Is .ap- 
plied. Consider the second-order digital filter of figure 4-4a. A direct pipelining 
of this structure by adding a unit delay preceding the r q multiplier, as done with 

figure 4-3a, will result in a very different transfer function than the original one. 
To preserve the transfer function desired, except for series delay, unit delays 
must also be inserted In the parallel feedforward branches r ^ and r ^ This new 

(one-level) structure appears in figure 4-4b but is not node-minimal. Figure 4-4c 
shows an equivalent node-mintmal structure, requiring only one additional state in- 
stead of three. Its modified state space representation Is shown In (4.6): 
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The second difficulty encountered in applying pipelining techniques involves 
feedback. Suppose there exists a series of operations which makes up part of a 
closed feedback loop within a structure. Pipelining these operations would result 
(as with the previous example) in a very different transfer function. Consider the 
filter of figure 2-6. Its transfer function and two-level modified state space 
representation are shown In equations (4.6) and (4.7): 


«(z)« 


r 0 z 


-1 


1 +(c 1 -r Q )z' 1 


(4.6) 
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1/0 


hi 1 ’) 


(4.7) 


If we pipeline by Inserting a delay preceding r Q (or by equivalently moving the r Q 
branch to state node v ^ ), the modified state representation will indeed show only 
one level: 


-c i 1 1 

r 0 o o 


(4.8) 


However, the overall transfer function is now quite different: 
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(4.9) 



Figure *6: Filter with Output Feedback 
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would seem to be necessary unless we were willing to drop the sampling rate to 
jr. Unfortunately, the series delay that would result from pipelining this compen- 
sator would Introduce an unplanned-for pure time delay. The deleterious effects 
of pure time delay (linearly-increasing negative phase shift) on the stability and 
phase margin of a feedback system are well known. Even If Instability does not 
result, the performance index J will be larger than expected and the qualitative 
dynamic performance will be compromised. 

Fortunately, there is an approach to pipelining that will be effective for 
control systems. Consider the LQG system and compensator design technique 
described In Chapter 2. Assume that for some original controller design, the sam- 
pling Interval Is not long enough to complete all the calculations Involved In the 
compensator (which is the situation as described above). In principle, pipelining 
techniques could help, but unavoidable delay would be Introduced. An effective 
use of pipelining simply means that we somehow include this unavoidable delay In 
the original design procedure. This aim can be realized through state augmentation 
[1]. Suppose that pipelining would allow a factor of two Increase in the sampling 
rat j, thus adding only a single series delay. If the plant is described at the dou- 
2 

bled sampling rate y by (4.10): 

x(A+1) -$x(A) + r u(A) + w-jOO 

y(A) - L x(A) + iVg(A) (4.10) 

(recall that the matrix parameters above depend on T ) then, preceding u(k) with 
the series delay to form o(A), the augmented plant can be modelled as follows 
(see figure 4-6): 
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(4.11) 


Where *(*+!)■ 



For this augmented system, the weighting matrices Q 


and M in the expression for the performance index (2.6) must also be augmented, 
adding an all-zero row and column to Q, and a single zero element to .M. The 
weighting parameter R will be the same as for the system (4.10). Now we must 
treat (4.11) as a new system and design an LQG compensator for it. Then that. 
design can be pipelined, which Introduces the inherent added delay shown in 
figure 4-6. 



y(k) 


Figure 4-6: State Augmentation for Control System Pipelining 


For this situation, two observations can be made. First, the Kalman Alter 
portion of the LQG design for (4.1 1) will have what seems to be a difficulty due to 
the added delay — the numerical routines blow up. Common sense dictates how- 
ever that there Is no need to estimate x n+ ^(k) •= u(k) since It Is the actual plant 

Input, which Is known. Thus we need only estimate x ^k) through x n (k), namely 
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the vector x(A). That estimation problem has already been solved os the re- 
order Kalman filter for (4.10), with gains A.j through A^. Using these results, the 

optimal filtering gains for the augmented system (4.11) can be written: 


4 

i 

i 



(4.12) 


The (n+1)*^-order optimal regulator problem for (4.11) can be solved with no 
difficulty at all. 

< 

The second observation that we can make for this augmented-system pipe- 
t lining technique involves the consistency of the design technique. A delay-canonic 

structure for the optimal LQG compensator for (4.11) will be of order n+2 since 

■i ■ 

■ (4.1 1) Is of order n+1, and not of order n+1 as Is the canonic compensator struc- 

ture for (4.10). Thus this, approach to controller pipelining gives rise to a compen- 
sator of higher dimension (more poles), requiring more states (delay elements) and 
more coefficients. Along with this increase In order comes a more important point 
I — the new higher dimensional compensator structure must allow the same degree 

of pipelining as the original -structure, or the whole controller pipelining design pro- 
cedure Is Invalid, that Is, Inconsistent. This point is especially of concern when 
using structures whose number of precedence levels Is a function of the number 
of compensator states (for example, the cascade forms). As an example, consid- 
er a second-order plant and a direct form n compensator structure, which requires 
three delays and two precedence levels. To exploit pipelining, we must augment 
the plant and redesign the compensator — Its direct form II structure now re- 
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quires four delays (states). There would still be only two (node or multiplier) pre- 
cadence levels as before, so pipelining to double the sampling rate will work as 
planned. However, If we decide to use a cascade of two direct form sections (as-‘ 
aume one second-order section, one first-order section, and general non-power-of- 
two scaling multipliers), then the result is three precedence levels. Pipelining to 


allow the 


2 

r 


sampling rate will not now result In the effect of a single added unit 


delay as assumed, but will involve two series unit delays, making the design pro- 
cedure invalid. In other words, If we Implemented the pipeline as described 
above, the system would not perform as expected; more delay would be present 
In the loop than had been accounted for in the design. Such problems can be 
avoided with a proper choice of structure. 

There Is one positive note associated with the increased dimensionality of 
the compensator, and It is related to the particular form of (4.12). Usually, an In- 
crease in dimension (number of states) by one involves at least two additional 
coefficient multipliers. (A fifth-order plant requires a compensator with at least 1 0 
coefficients, a sixth-order plant requires one with 12 coefficients, etcetera — 
see figure 3-6) However, by virtue of the zero entry in (4.12), the general form 
of the compensator transfer function for the augmented system is simpler: 


H(z ) 


a 2 z~ Z + a 3 z~ a + ••• + a n+1 z-( n+1 ) 
1 + b^z~^ + ■ ■ ■ +b n+ ^z“^ + ^ 


(4.13) 


Comparing (4.13) to (3.2) shows a difference of only one coefficient — not two. 
This fact helps make the pipelining approach a bit more attractive, at least with 
certain structures (for example, any direct form and any cascade or parallel 
structure based on a direct form.) 
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One last general point should be mentioned. The application of any pipelin- 
ing technique or the use of parallelism to Increase the sampling rate Is desirable 
only if it allows a decrease in the performance index J, or in whatever gauge of 
system performance one accepts. However, not all systems have a performance 
measure that decreases (Improves) monotonically with decreasing T [26]. Intui- 
tively, any system with sharp resonances will lose controllability (Implying a large 
J) when the sampling frequency is near a resonance. One must be aware of such 
cases. If such a case does not occur, then pipelining will reduce the perfor- , 
mance Index, although certainly not as much as the (non-implementable) straight* 

2 

forward rate-y LOG compensator design which adds no delay. Whether this pipe- 
lining approach Is effective enough to warrant the higher-order compensator 
depends on the designer’s particular application. 

$4.4 Controller I/O Pipelining 

i i 

One common application of pipelining in a feedback environment Involves 
the often time-consuming compensator Input/output (I/O) operations^ namely, the 
sampling and the A/D and D/A conversion operations. Let us 'assume that a struc- 
ture with one multiplier precedence level (for example, the block optimal parallel 
structure of (3.27)) Is chosen to Implement a compensator, and that a totally- 
parallel architecture is used for the multipliers involved. The compensator can 
then be modelled as a three-process task (figure 4-7). With no pipelining the 
minimum sampling period T equals ^^2^3 seconds - Assume that the slowest 

process is the multiply time and that " t-j+fg " y- If we now pipeline these 
three processes, a factor of two Increase in throughput and sampling rate Is pos* 
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Figure 4-7: Three-Process Compensator Model 


slble. (Throughput rate is limited by the slowest process). At each sample time, 
sampling and A/D conversion of a new y sample would begin. Then 1 1 seconds 

later the structure multiplications could begin, overlapping the next sampling and 
A/D operation. Figure 4-8 diagrams the processes occurring in such an I/O 


< — ti 
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Figure 4-8: Concurrency of Processes In I/O Pipelined Compensator 
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pipelined compensator with Increased sampling rate 


Note that the hardware 


multipliers will now be active 1 00% of the time.) We can represent this pipelined 
system as the designed compensator structure followed by a series unit delay 
resulting from the pipeline. Since part of this unit delay is Involved in buffering 
the Intermediate A/D results and the rest is involved in buffering the multiplier 
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results from the structure, two hardware storage registers will be required for 
this example. However, their clock signals will be staggered, since the three 
operations of figure 4-8 take different amounts of time to complete. Basically, 
these clock signals (all of period 2/7 ) must be phased so that the results from 
each process are stored as soon as they are completed. Thus register 1 Is 
clocked by sample pulses delayed by seconds, and register 2 Is clocked by 

sample pulsec delayed by t-j+t 2 seconds. (This phasing Is shown as fractional 

delay time in the simple example of figure 4-9.) 

If we apply the design technique outlined In section 4.3 to produce a 
(pipellneable) compensator for this I/O case, the order of the compensator will of 
course be one greater than the non-plpellned design, Implying at least one addi- 
tional state and coefficient. No matter what the plant dimension may be, a block 
optimal parallel structure (or any state-space structure — see section 3.3) will 
have only one precedence level. Thus, I/O pipelining with a one-level compensa- 
tor structure results In a valid design procedure. 

S4.fi Compensator I/O Pipelining Examples 

Four examples have been selected to illustrate what can occur with com- 
pensator (I/O) pipelining. Each example consists of four cases. Case 1 
represents the plant discretized at a T second sampling period with its 
corresponding LQG compensator (no pipeline). Case 2 represents the plant 

discretized at a ~ second sampling period with Its corresponding LQG compensa- 
tor. This case does not Include any pipelining, but is not physically Implementable 
due to the short sampling Interval. The performance Index for this case constl- 
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tutos e,n unreachable lower bound to the performance of the augmented-plant ap- 
proach to pipelining (case 3)). Case 4 (blind pipelining) results when the compen- 
aator designed for case 2 is pipelined in order to make It physically Implement- 
able. Thus the delay due to the pipeline is Ignored In the pipelined design, usual- 
ly resulting In a performance level that Is worse than the non-plpellned level (and 
perhaps even in a system that Is unstable). Assuming that J is a monotonic in- 
creasing function of T, we can expect that the different cases will rank, from 
highest J to the lowest, as follows: case 4, case 1 , case 3, case 2. (It Is possi- 
ble but unlikely that case 4 could have a lower J value than case 1 .) Remember, 
however, that case 2 is not implementable. 

The simplest I/O pipelining example consists of a single-input, single-output, 
single-integrator plant: 

x[t] -t/CO + w^t] 

y[«] + (4 ‘ 14) 

where r-6 seconds. Referring to Chapter 2, equations (2.1)-(2.3), the parame- 
ters Q and R were both chosen to be 1 and the noise intensities and E 2 were 

selected to be 0.3 and 0.125. Figure 4-9 illustrates the discretized system and 
the form of the compensator before pipelining (case 1) and after pipelining 
through state augmentation and redesign (case 3). A one-level version of the 
direct form II structure (obtained from the matrix of the direct form II, as 

mentioned in section 3.3) is used for the compensator. Note the inclusion of the 
two fractional delays (registers) In figure 4-9b, as mentioned earlier in this sec- 
tion. The form of the system for case 2 would look the same as that in figure 4- 
9a; however the gains of all the branches would differ. For case 4, we need only 


80 . 


Chapter 4: Architectural Issues: Serialism, Parallelism, and Pipelining 


(a) Rate 1/T system, T=6 (case 1) 


Wi(k) W2(k) 




Figure 4-9: Compensator I/O Pipelining for the Single~Integrator Plant 
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•dd one series delay to the signal flow graph of case 2. 

Three other examples are also considered; a double-integrator plant, a 
two-state harmonic oscillator plant, and a sixth-order plant derived from the longi- 
tudinal dynamics of the F8 fighter aircraft (see Chapter 6 and Appendix A). The 
continuous-time parameters of the double-integrator system are shown below: 


*[0- 
y[f]-[ i 


o 1 
0 0 
0] 


x[t] + 


0 

Li 


(4.16) 


For this system, the continuous-time parameter Q was a 2X2 Identity matrix, R 
was 1, was the diagonal 2X2 matrix dlag(0.2, 0.3), and E 2 was 0.126. For 

the harmonic oscillator, all the parameters were the same as for the double- 
integrator system, except for the A matrix which is given below: 


A 


0 1 
-1 0 


(4.16) 


The performance Indices for all the various cases are shown in figure 4^10. 


Key: 

Case 1 — rate 1/T system 

Case 2 — rate 2/T system (not Implementable) 

Case 3 — rate 2/T pipelined system designed via state augmentation 
Case 4 — blind pipelining 


example plant 

T 

Case 4 

Case 1 

Case 3 

Case 2 

single integrator 

6 

(unstable) 

2.42 

2.06 

1.34 

double integrator 

6 

(unstable) 

328 

179 

63.2 

harmonic oscillator 

6 

(unstable) 

32.7 

12.9 

9.72 

6-state F8 plant 

1 

.0038 

.00312 

.00282 

.00222 


Figure 4-10: Compensator I/O Pipelining 


Under case 4 we see the consequences of pipelining and ignoring the delay In- 
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curred. Three of the example systems actually became unstable, and with the 
fourth, the Index J increased. As expected, all the case 2 Indices were lower 
than case 1, with case 3 lying between the two. To judge the effectiveness of 
the state-augmentation pipelining method of case 3, one must examine the degree 
of improvement in J relative to the possible improvement (the difference between 
cases 1 and 2). The best improvement shown was for the harmonic oscillator, 


which Is no surprise since the oscillator’s natural frequency of ■— radians/second 

1 Jw ’ 

Is close to the unplpelined sampling rate y. The remaining three examples also 


showed significant improvement. Again, whether or not the pipeilneable compensa- 
tor (with one extra state and at least one extra coefficient) Is to be used will 
depend on the particular level of performance desired and the penalty Involved in 
complicating the hardware; 


§4,0 Summary 

To summarize this chapter briefly; section 4.1 introduced the architectural 
notions of serlallsm, parallelism, and pipelining, and explained the hardware 
cost/execution time tradeoff tied to these issues. The issues of serialism and 
parallelism were shown to involve the seme considerations for digital compensa- 
tors as for digital filters. Section 4.2 discussed the limitations of pipelining tech- 
niques, especially the one concerning pipelining in a closed loop (feedback). The 
extra delay Incurred due to the use of pipelining had a deleterious effect on the 
performance of the feedback system. This problem made the consideration of 
pipelining for feedback compensators very different than in the case of digital 
Alters. Section 4.3 developed a design technique based on state-augmentation 
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for dealing with the problem of control system pipelining. Finally, the last section 
treated a typical application of pipelining techniques to microprocessor-based con- 
trol systems. For this application, the compensator Input/Output operations and 
multiply operations could be pipelined to realize a doubling In the system sampling 
rate. Four examples were presented to Illustrate the technique. 


I 
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Chapter 6: Finite Wordlength (Effects: Quantization Noise 
$6.1 Introduction 

One of the major Implications of the use of finite wordlengths within a com- 
pensator is the necessity of having the nonlinear operations of quantization and 
overflow in the structure. First, the Input A/D unit must convert an analog signal 
to a fixed-point representation with a specific number of bits n^j. (Commercially 

available units typically produce 6 , 8 , 10, 12, or 16 bits). This procedure in- 
volves an Implicit quantization of the Input level to one of the set of possible 
/»a 0 ~blt words and constitutes an approximation (a source of error). The 

remainder of a structure’s quantizers are required by the multiply operations 
within the structure. Given r^-bit digital words for the node signal variables, then 

any multiplication by a -bit coefficients produces an (/?,+/»_ )-blt product. To store 

this result in an r/^-blt (state) storage register, or to serve as an /> r —bit Input to 

another multiplier, requires a quantizing operation. Furthermore, the addition of 
two /> r —bit fixed-point words could produce an extra significant bit, which requires 

another nonlinear operation to keep the wordlength at n f bits. Discussion of such 

overflow nonlinearities will be deferred to Chapter 7. 

The A/D and multiplier quantizations mentioned above introduce two types 
of undesirable effects, classifiable as periodic and random. The periodic effects 
(limit cycle oscillations) will be treated in Chapter 7. The random effects, quanti- 
zation noise, are the subject of this chapter. 

Several distinctions can be made when referring to quantization noise. 


Section 6.1: Introduction 


85. 


First, the storage registers (and quantizers) within a structure may have different, 
nonuniform, wordlengths; such a structure will always perform beter In terms of 
roundoff noise effects than the constrained case of uniform wordlengths [All]. 
However, by using uniform wordlengths, the hardware expense and complexity will 
be greatly reduced. Often, little potential performance Is lost by such a restric- 
tion. Since the A/D converter Is usually a separate piece of hardware, little 
affected by the remaining compensator hardware architecture and design, it need 
not be subject to this restriction. Consequently, A/D and Internal wordlengths can 
and typically do differ. We will assume that the signal variable registers are of 
uniform wordlength, and that the A/D wordlength can be different from the Internal 
compensator wordlength. 

The second distinction is In the placement of the structure’s quantizers. 
On one hand, they can be Inserted after every multiplication — ensuing adders 
would thus have to deal only with n^-blt quantities. However, if we are willing to 

complicate the adders, quantization can be delayed until after the node additions, 
placing them Just before each storage register or intermediate node value r(A). 
With this method, adders would have to sum />,+/j„-bit quantities, but fewer quan- 

tizers are needed. This alternative trades off hardware complexity (double- 
versus single-precision adders) for quantization noise (fewer quantizers Implies 
fewer noise sources). Both these options will be considered In this chapter. 

The final distinction in discussing quantization noise Is in the type of quan- 
tizer used. Commonly, the choice Is between rounding, which selects the finite- 
precision word that Is closest to the ideal value, and truncating, which simply 
drops the extra bits of precision. Truncation, and specifically sign-magnitude trun- 
cation, has the advantage of requiring no extra hardware, and also an advantage 
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In terms of the resulting (reduced) number of possible limit cycle oscillations. 
However, rounding can be shown to have the advantage of reduced quantization 
noise effects, and the extra hardware it requires is not very complex. In addi- 
tion, roundoff effects are more easily analyzed. Consequently, this chapter will pri- 
marily focus on roundoff quantization. In Chapter 7, we will consider other ap- 
proaches to quantization which provide advantages In terms of limit cycle 
behavior, that Is, fewer limit cycles or limit cycles of smaller amplitude. 

This chapter is organized as follows. Section 6.2 will discuss the major is- 
sue of dynamic range and scaling as applied to digital filters. In section 5.3 we 
will adapt these Ideas for digital control compensator scaling. For this adaptation 
we will have to consider the entire closed-loop system In determining the appropri- 
ate scaling for compensators. Set-point LQG configurations and their Implications 
as regards the scaling issue will also be discussed. Section 5.4 will describe the 
roundoff and sign-magnitude truncation quantization characteristics and present 
models for analyzing their effects. Methods of analyzing roundoff noise effects us- 
ing the model developed in section 5.4 will be treated in section 5.5. Using these 
procedures, section 5.6 will describe the minimum roundoff noise filter structures 
Introduced by Mullis and Roberts M 8,37,38] and Hwang [39], and will then adapt 
these results to derive minimum roundoff noise compensator structures. Finally, 
section 5.7 will demonstrate the procedures developed in Chapter 6 for compen- 
sators by applying them to 10 candidate structures for Implementing a specific 
control system. 
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$6.2 Dynamic Range Constraints 

It Is not meaningful to discuss quantization noise effects (proportional to 
the least significant bit of the signal word) without also considering the dynamic 
range of the signals within the structure. Our overall objective is to minimize the 
total number of bits necessary for the fixed-point digital words. Choosing a 
specific structure based on its required least significant bit size (quantization 
step size) Is of little value unless the fixed-point words can represent the full 
dynamic range of the node signals while keeping overflows to a minimum. Thus, 
we must maximize stgnal-to-noise ratio without incurring overflow. These aims can 
be accomplished through scaling. By scaling the coefficients of a structure we 
can reduce the overall dynamic range of the signals within the structure and also 
normalize the maximum signal size (the overflow level) at each node. Once a 
structure Is scaled, we can use the quantization step size as a valid basis for 
comparison with other structures which have been scaled using the same scaling 
procedure. (This section will present several of these scaling procedures.) Note 
that scaling does not alter the type of structure nor its ideal transfer function. 

Consider the second-order filter of figure 6-1 a. This structure has three 
states, Implying three storage registers. Clearly, if the ) node and the 

y(k+1) output node do not overflow, then none of the node signals will overflow, 
since the other nodes are simply delayed versions of these two. Thus, scaling In- 
volves overflow constraints on these two nodes. Such constraints would be Ine- 
quality constraints, that is, the signal magnitude must be less than the overflow 
level. Of course, too small a signal magnitude would result in higher quantization 
noise levels. Intuitively, we would like to alter the magnitudes of the signals at 
these two nodes Just enough to prevent the occurrence of overflow, but without 
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(a) Unsealed 



(b) Scaled 



di * aik2/ki 
d2 = B2k2/k| 


Figure 6-1: Scaling a Second-Order Section 

changing the filter transfer function. For example, to modify the signal magnitude 
at the ir 2 (A+1) node, the Input unity coefficient must be multiplied by some factor 

A r and then to preserve the transfer function of the filter, the three coefficients 
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• q, a. j, and a 2 must be multiplied by 



Similarly, scaling the y(k+ 1) node in- 


volves multiplying a Q , a^, and « 2 by another factor kg. The corresponding 



factor must then be absorbed by the output D/A converter to ensure an un- 
changed overall transfer function. The resulting scaled structure Is shown in 
figure 6-1 b. 

An important choice must be made in selecting k^ and kg. Let us define 


optimal scaling to refer to' that choice of scalers which satisfies the dynamic 
range constraints of the scaling procedure (inequality constraints) with equality. 
Thus, in general, such scalers will not be simple powers of two. For the example 
above, optimal scaling would result in a structure with 6 non- trivial multiplications, 
Instead of 5. Optimal scaling usually carries the advantage over npn-optimal scal- 
ing (which results when the scalers are constrained to be simple powers of two 
to simplify the hardware) of reduced quantization noise effects, even with the ex- 
tra noise sources caused by the additional scaling coefficients. Thus, scaling in- 
troduces another tradeoff between performance and hardware complexity. 

Two basic methods exist for choosing the dynamic range constraints. The 
first is a deterministic norm-based method introduced by Jackson [55]. Define the 
L p norm of a digital frequency-domain transform H(z) as follows: 


u 




s 0 


± 

P 


( 6 . 1 ) 


where Is the sampling frequency in radians per second. If F^(z) Is defined to 
be the transfer function from the Input to the node that must be scaled, then 
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Jackson has used the fact that: 


I r / (A) | * 1 F i | D II u 1 Pn fw ~ + ~ " 1 « P.P 0 * 1 ■£ ttnd for ,. aH * 


where r^(k) Is the signal at the node to be scaled, and U(z ) is the 2 - 


k-, 


transform of the filter Input u(A). Note that when this inequality is applied to 
ti(A) Itself {r^(k) m u(k), Fy(z)-I) we find that |w(A)| £ Mq if Qc/ Hp 0 ^ ^0 for 

wyp 0 Z^. 

Now let us return to the scaling issue for node I. Assume that the max- 
Imum signal magnitude possible In the filter without overflow Is Mq.. . Further as- 


sume that ||U||p. ^ Mq, and thus u never overflows (its magnitude is always 
S/If 0 ). Then using (6.2), the node signal r f will not overflow if: 


F, s 1 for all / 
' P 


. (6.3) 


This scaling rule, L p scaling, must be satisfied at every node in the filter struc- 


ture. Satisfying this rule with equality corresponds to optimal scaling as 
described above. For the example of figure 6-4, the scaling multipliers k ^ and A 2 


must be chosen to satisfy (6.3) for / = 1 and / - 2. 

The scaling rule described above still allows some degree of freedom even 
for optimal scaling, namely the choice of p Q and p. If all we know about the input 

u is that Its magnitude will be below M Q (so that u could be a DC level), then Pq 


can only be infinity. The only scaling that we can apply Is scaling. However, 
assume that u Is also known to have no DC component, and In fact suppose that 
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||<7|| 2 £Mq. Now we can select Pq to have any value between 2 and infinity. 
For example, a Pq of 2. would correspond to l 2 scaling, and a Pq of infinity would 
correspond to scaling. In this case, we would select the scaling method that 
would result In lower levels of quantization noise, the L 2 scaling method. In gen- 
eral, the larger the p (meaning smaller Pq), the less conservative the scaling rule 

will be, Implying lower noise levels. Thus the more we know about the possible 
filter Input signals, the better the scaling will be in terms of the resulting noise 
levels. For example, If all we know about the input is that It is smaller than M Q 

In magnitude, then it could even be u<*Mq. For this case, Pq* 00 , and p*> 1. 

Thus the norm of (z ), the area under the Fj (z) curve, must be forced to 1. 

This type of scaling is more conservative (results in more quantization noise) than 
L 2 or L n scaling. 

A related deterministic scaling method has been described by Hwang [56]. 
This method Is based on the time-domain norm of the infinite sequence r f (k) be 

defined as: 


1 



The time-domain counterpart of (5.2) can be written as follows: 

h ( * ) l s I'/ l p l"I Po < 6s > 

where f j(k) is the impulse response of node I at time k, and u(k) is the filter in- 
put. The following scaling law results: if Mq is the maximum signal magnitude al- 
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lowed In the Alter and 


► Ip *** then 


|f/ |^S1 for all/ (6.6) 

guarantees no overflow. 

In order to compare L p and scaling methods, we must examine the rela- 
tionship between the L and 1^ norms [66]: 

U iL S Mil* Hl 2 “ II « 1 a ^ [u\ m * IMh (6.7) 

Given the relationship of (6.7), we can determine how conservative any given 
scaling rule Is as compared to all the other scaling rules. From (6.7), we know 
that if the Input satlsAes the constraint ||t/|{ ( <. Mq, then It must also satisfy 

\\u\\„ZM Q ^ut not vice-verse). Thus, as far as the type of input signal Is con- 


cerned, knowing that the norm of the Input is less than Mq Is less restrictive 

than knowing that Its l v norm is less than Mq. We can generalize this statement 

to the entire list In (6.7). Since a less-restricted input corresponds to a more- 
conservative scaling, we can use the relationship (6.7) to determine how any 
scaling method compares to any other. Thus the most conservative scaling is 

scaling, and the least conservative corresponds to I ^ scaling. The actual scaling 

method selected will depend on what Information is known about the Alter input 
signal and its transform. 

The second method for establishing dynamic range constraints and choosing 
scaling multipliers Is a stochastic method [18,37,39]. With a random input signal, 
one considers the probability of overflow at each node rather than trying to 


Section 6.2: Dynamic Range Constraints 


93. 


prevent overflow completely, which is no longer possible. Scaling will be accom- 
pllahed by equalizing the probability of overflow at each node. Let us assume 
that the maximum signal level without overflow is Mq, and that the Input is a 


zero-mean Gaussian random process of standard deviation 



The probability of 


overflow at the Input A/D is then 0.003. The variance of the signal at node / will 


be equal to 


M 




But this quantity Is just the / 2 norm of f f (k) 


multiplied by the Input variance. Thus, to equalize the probability of overflow at 
each node we must set 11^/ tig “ 1 tor all /, which is equivalent to / 2 or i 2 deter- 

mlnlstlc (optimal) scaling. 

In terms of a state-space structure as discussed in Mullls and Roberts 
[18,37], scaling corresponds to a diagonal similarity transformation of the un- 
sealed structure. In the more general context of Chan’s notation or the modified 
state space representation as described in Chapter 3, scaling can be described 
by a set of diagonal scaling matrices S^. We will essentially follow the presenta- 
tion of scaling for filters made by Chan [17], but in the context of the modified 
state space representation. (Thus a delay will be added to the output of the 
filter structure, as with a compensator, but the structure is still a filter — no 
external feedback Is Involved.) We will extend scaling ideas to the control set- 
ting in section 6.3. 

A scaled structure has the following modified state space representation: 
(input y, output u ) 


§ 4 - 
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( 6 . 8 ) 



where v is a vector, y and u are scalars representing the compensator input and 
output respectively, and the tilde designates scaled quantities as opposed to the 
original unsealed values written without a tilde. The matrices are re- 

lated to the matrices 9^, . . . , ♦^ by: 

*/ - S f 9, (s,.., ) -1 for /-g. • - • , 1 (6.0) 

where 


and all S f are diagonal. Since the u(k) is scaled, the D/A scale factor must in- 
clude an extra multiplicative factor p equal to the reciprocal of the (n+l.n+l)*^ 
entry of S to convert u(k) to u(k). 

In the context of the modified state space representation, we can now exam- 
ine stochastic 1 2 scaling using (6.6). Let us partition 9^ “ 9^ ••• 9^ (defined 

In section 3.3) as follows: 

♦•-[*11 * 12 ] < 6 * 10 > 

where is (n+1)x(/?+1) and 9 12 is (o+1)xl. Assuming infinite-precision 

coefficients, the stales. Input, and output of the filter can be related with the fol- 
lowing state space of order n+ 1 : 
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*>) 




[r<A*1)l T . [►(*)! . , , 

uM-10 0 0 ■■■ 0 1 ] ['(*,] 


For this system of equations, the state covariance matrix 1 / can be written: 




00 

• 2 

7-0 


Or 7“ 

V 11 


12 


Or 7-1 Of 
V 11 M2 


)' 


(5.12) 


Let us define the matrix K to be l//«. A Lyapunov equation equivalent to (5.12) 
Is usually easier to evaluate for computing K^: 


*1lVll' + *12*12 


(6.13) 


The diagonal elements of K ^ represent the gains from the input variance to the 

state node variances. Now we need the gains from the input node to the inter- 
mediate node variances, assuming that the structure is multi-level. Since the in- 
termediate nodes are related to the state nodes via the precedence levei ma- 
trices through we can compute a set of matrices Kj whose diagonal 

elements are the desired gains from the input variance to the variances of the in- 
termediate node vector r f : 



■*/Vi 


. . ^ 



1 L 0 


0 


for /=1, ■ ■ • ,qr-1 (6.14) 


Stochastic scaling (/g scaling), which equalizes the probability of overflow at all 

the nodes in the sv;t‘uture including the input, can be realized by forcing all the 
diagonal entries of the Kj matrices to unity. Thus all the node variances will be 
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the same as the input variance. This scaling is accomplished by applying a diago- 
nal transformation to the unsealed structure where: 

[ s / ]jj m ( [*/ J >y ) ~ H for ' “i » ‘ * ’ *9 and al1 J (6.1 6) 

The resulting structure (6.8) would have ^ matrices whose diagonal elements 
were all unity entries, as desired. 

$5.3 Digital Feedback Compensator Scaling 

In this section we will discuss the Implications of LQG set-point 
configurations to the Issue of compensator scaling, and then adapt the ! 2 sto- 
chastic scaling method described in the previous section for filters to the digital 
feedback compensator. 

The scaling Issue for digital compensators differs In certain respects from 
the filtering applications described above. The first of these involves the type of 
scaling appropriate to LQG systems. Most of the LQG configurations as described 
In Chapter 2 will have set points, In other words, reference Inputs for the regula- 
tor portion of the design. These non-zero set-point regulators [1] will have the 
same parameter values as described In Chapter 2, Independent of the set point, 
but the resulting DC compensator input will affect the scaling. As stated before, 
conservative scaling Is required whenever we allow the presence of DC inputs. 
Specifically, / 2 scaling Is not possible, eliminating the stochastic approach. 

Figure 6-2 presents the set-point LQG system described in Kwakernaak and 
Slvan [1], where u f Is the reference Input. If we wl3h to drive the output y to 

y r , then u f must be set to H ~ 1 ( 1 ) yy . where H c (z) Is the closed-loop transfer 
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Figure 6-2: Set-Point Compensator Configuration 
function from u f to y: 


H c (z)-Uzl - # + rG)~ 1 r 


(6.16) 


Unfortunately, this compensator has a DC input since the steady-state value of y 
Is non-zero. Thus l ^ scaling is not' possible. However there Is one other 

(equivalent) approach to describing the system of figure 6-2 and the equations of 
Chapter 2. Define f, v, and y to be the deviations of the states, Input, and out- 
put from the steady-state values x Q , Uq, and y Q . Thus, (;”x-Xq, wu-Uq, and 

y m y-y Q . As In [1], the following relationship must hold: 


*o-**o +r "o 


(6.17) 


Now, follow through the LQG design equations of Chapter 2 for the (deviations of 


Chapter 6: Finite Wordlength Effects: Quantization Noise 





the) states {, input v, end output y. With the actual state, Input, and output vari- 
ables being represented by x, u, and y, we can then produce figure (6-3). Thus it 
is possible to use an alternate LQG set-point configuration where the 


(l.fi.?) Plant 


Compensator (2.14) 
designed for 
({,fi.7) system 


Figure 6-3: Alternate LQG Set-Point Configuration 

compensator input has an average value of zero, thereby allowing us to apply sto- 
chastic (/ g) scaling. The disadvantage to this alternate configuration is the 

necessity of having two reference Inputs which must maintain the precise relation- 
ship (6.17), typically In the presence plant parameter uncertainty. 

This disadvantage will vanish whenever the plant has a series Integration 
(at least one pole at the origin 8*0), which Is a very common occurrence In con- 
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trot systems. In fact, frequently an Integrator Is added to an actuator (part of 
the plant) to provide deaensltlvity to constant disturbances. To see the effect of 
an integrator pole on the configuration of figure 6-3, let us write Uq as 


(u/-*r 1 r)“ Vq. However, since the DC gain £(I-4)~1r blows up if there are 
any open-loop Integrator poles in the plant (poles at z-1), Uq Is forced to zero. 


In other words, If the plant has any series Integration, the LQG configuration of 
figure 6-3 need have only one reference Input, y^-y^,, and not two. Note that the 

configuration of figure 6-2 does not change when the plant has integrator poles; 
both compensator Inputs will still have DC components, and the system as a whole 
still requires the reference input u f . From this point on, the figure 6-3 

configuration is assumed so that / 2 scaling can be applied. 

The second difference between filter and compensator scaling arises when 
we try to apply / 2 scaling as described in (6.12M6.16) to a compensator. This 

procedure would treat the compensator as a separate entity (functionally a filter), 
Ignoring the LQG plant and feedback path, Yet a compensator operating open-loop 
need not even be stable. The stochastic scaling method requires the variances 
of the signal variables at the compensator state nodes so that the matrices Kj 

and Sy can be computed. Clearly these variances depend on the overall closed- 

loop performance. Thus we will have to adapt the filter scaling procedure so that 
It applies to digital feedback compensator scaling. 

We have developed the following scaling procedure to account for the LQG 
feedback system in which the compensator is embedded. The steady-state vari- 
ances of the n plant states and n+1 compensator states can be found by combln- 
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ing the state and compensator equations into a single augmented state space: 


*(A+1) 


x(A) 


ir(A+1) 

■ A 

vik) 

4* 

l/(A+1). 


u(k). 



w^k) 
¥ 12 tv 2 (A)j 


(6.18) 


where 


A - 


1 

* 1 

1 

°n j r 

. *12* ! 

*11 


®nd 0„ represents an all-zero nxn matrix and ¥.j ^ ^ -\2 re P resent the unsealed 

compensator as partitioned in (5.10). With this state space, let us now follow the 
general scaling procedure outlined in section 6.2. The overall (2n+1)x(2n+1) 
state covariance matrix Z can be computed by solving the following discrete*time 
Lyapunov equation: [16] 

Z-/IZ/T + C (6.19) 

where 


C - 


6 


1 


0 f^iz^^'iz} 


We now partition Z to separate the plant and compensator covariances: 


Z 11 Z 12 
7 I 7 

M2 M2 


( 6 . 20 ) 


where Z^ is nx/j. As defined in (6.12), will result from dividing Z 22 by 
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yy-#2. T}iua 


K - 22 

9 


C6.21) 


To compute K y as in (5.14), the compensator states and input y must be uncorre- 
lated. However, feedback Introduces correlation: 



z 22 Z^U 
LZ 1Z LZy^U 


( 6 . 22 ) 


Normalizing by <r we get: 






LZ 


12 


* 11 *' 




<b • 
V 1 


Vi*/' 


(6.23) 


for /-I, • • • f <j-1 


The scaling matrices now follow directly from (5.15). This scaling technique 
has been applied for the optimal / 2 scaling of the compensator structures treated 
In this thesis. 

The last controller scaling question that arises concerns the A/0 and DA con- 
verter scale factors. Once a compensator Is scaled via (5.18M6.23), the proba- 
bility of overflow within the compensator equals the probability of overflow at the 
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A/D (for Gaussian A/D Inputs). By setting the A/D scale factor (and Inversely ad- 
justing the D/A scale factor), we can control this overflow probability. In such a 
procedure the compensator scaling procedure is unaffected by the A/D scale fac- 
tor — the scaling multipliers remain Invariant. The dynamic range of the input 
and output transients In the system (caused by changing the set point for exam- 
ple) and of the set point Itself will also affect the actual A/D scaling choice. 

Whatever Is chosen for (and k^ a must include a k~J factor as well as the p 

factor resulting from the scaling of the compensator output node) the effect of 
quantization noise on the performance index or on £he output noise variance will 

Increase as k~£. 

$6.4 Quantizer Characteristics and Models 

In order to analyze the effect® of quantization In some tractable and sys- 
tematic fashion it is necessary to model the nonlinear operation of quantization. 
This section will present the roundoff and sign-magnitude truncation quantizer 
Input-output characteristics and the models commonly used for them. A discussion 
of model validity then follows. We will assume throughout that the fixed-point 
words representing signal variables have Of bits to the right of the binary point, 

. -n f 

and that A is defined to be the quantization step size. (A » 2 ' ) 

Figure 6-4 shows the Input-output characteristic of the roundoff quantizer. 
Let RO{x) be the rounded value of x. The error associated with such a quantiz- 
er, e * x-RO(x), satisfies (6.24): 
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quantizer output RO(x) 



“ < e ^ (6.24) 

The model commonly used to represent the roundoff quantization operation is the 
additive white noise model [57], In this case, roundoff is modelled linearly as a 
zero-mean random noise added to the ideal (Infinite-precision) signal value. The 
noise e Is assumed to have a uniform density as shown in figure 5-5 and to be un- 
correlated with the quantizer input signal. The validity of this model is an 
important consideration, since its use simplifies quantization noise analysis n great 
deal. For a continuous-time quantizer input signal, the usually-applied rule of 
thumb states that the noise model is valid if the Input to the quantizer crosses 
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Figure 6-6: White Noise Error Model Density 


’many’ quantization levels between sample times [28] — that Is, the Input magn- 
tude must fluctuate over a range » A in each T second period. 

A detailed analysis of the validity of the additive noise roundoff model has 
been carried out by Sripad and Snyder [68] and Sripad [13]. These authors have 
established necessary and sufficient conditions on the quantizer Input such that 
the model Is exact Let ^(s) be the characteristic function of the quantizer In- 
put x (the Laplace transform of the probability density p(x)). Then: 

(1) The density p(e) matches that of figure 6*6 If and only If t x -0 

for / * 0 and / an Integer. 

(2) The noise samples e(ft) and e(A+1) are unqorrelated If and only If the Joint 
characteristic function between the two inputs x(k) and at (A +1 ) satisfies 

*x(A),x(A+1) ) " ° f ° r 8,1 

(3) The quantities e(k) and x(k) are uncorrelated If and only if 4 X j - 0 
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and —•♦*,(«) - 0 for «»■ ^-,and all / * 0. 

dv x A 

Unfortunately these conditions are difficult to verify since the probability density 
function of every quantizer Input must be known! Even so, If a quantizer unput 
contains any Gaussian noise (typically assumed In control problems, at least for 
the A/D input) then none of the above conditions hold exactly. 

This validity restriction Is not as serious as It seems. Srlpad [13] has In- 
vestigated the properties of the quantization error given a Gaussian Input of vari- 
ance it 2 . From these results It is evident that the error e(A) has an approximate- 
ly uniform distribution for a \ ,7A, a condition that Is not particularly restrictive. 

In considering multiple quantizers (which Is the usual case), the question of 
the Interaction of the quantization errors arises. The above analysis actually ap- 
plies to a single quantizer only. When the model is used for all the quantizers 
within a complex (recursive) structure, we further assume that all such noise 
sources are independent. The question of the validity of this assumption is even 
more complex. However, it can be said that as a general technique, the additive 
noise model has proven Itself quite useful for the analysis of roundoff noise 
effects in digital filters. Furthermore, any analysis techniques aimed at selecting 
wordlengths based on the effects of quantization noise need not be exact any- 
way — the internal and A/D wordlengths can only be selected in units of whole 
bits. When the roundoff noise model breaks down, It tends to do so In a major 
way; limit cycles occur. These oscillations are usually quite evident when they 
are present (see Chapter 7). For our analyses, however, we will assume that 
the uncorrelated additive white noise model applies. 

Sign-magnitude truncation refers to the quantization operation of simply 
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dropping the extra bits of precision in the quantizer input. The advantages to 
this type of quantization are Its simplicity — no extra hardware Is required to Im- 
plement sign-magnitude truncation, unlike the roundoff case, and this type of 

f 

quantization gives rise to fewer limit Cycles. Figure 6-6 shows the input-output 


quantizer output SMT(x) 



Figure 5-6: Nonlinear Sign-Magnitude Truncation Characteristic 


characteristic of this quantizer. The quantization errors are now bounded as fol- 
lows; 


0 £ e < A forxZO 

-A < e £ 0 for x$0 

For this type of quantization, the modelling problem is more difficult, 


(6.26) 
From (6.26) 
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we can see that a definite correlation exists betW&ef) the error and the input 
values with sign-magnitude truncation; e(A) is a function of the sign of the quan- 
tizer input x(k), Such noise In a digital structure Is termed ?*ate-dependent noise. 
Although Srlpad [13] does present an additive white model for this quantization 
operation, the conditions for which the model is valid am too restrictive for gen- 
eral application. The additive white noise model is not even approximately valid, 
as is the roundoff model. Claasen, Mecklenbr&uker, and Peek [59] have proposed 
a quasl-linear model for sign-magnitude truncation: 

SMT(x) * x - -—=x + e (6.26) 

V2ir<r 


where e Is an uncorrelated white noise of variance ^ j A 2 and the quan- 
tizer input x is assumed to be a Gaussian process. The dependence on <r (the 
variance of x) accounts for the quasi-linearity and also the complexity in using 
this model for analysis, since the variance of each quantizer input must be com- 
puted. An efficient technique for evaluating these variances is given in [59]. 

Empirically, the noise variance at the output of a digital filter using sign- 
magnitude truncation would typically be about 6 to 10 times that of the same 
filter using roundoff quantization [60]. Thus one should have an extra two bits 
per signal word when using sign-magnitude truncation In order to produce the 
same (or better) noise performance as would result from using roundoff quantiza- 
tion. Beyond this qualitative statement, we will not consider the specific analysis 
of sign-magnitude quantization noise effects for control compensators. 
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§6.6 Roundoff Noise Analysis 


■w 

fcV’ 

& 


This action will examine several methods for evaluating the effects of 
quantization noise In digital filters and compensators. As mentioned before, we 
will focus on roundoff quantization. For filtering applications, we are typically con- 
cerned with the statistical effects of quantization on the filter output. Although 
Jackson [61] examines various norms of the output noise spectrum, the noise 
variance (the L g norm squared) Is usually taken to be the metric. 

There are two basic methods for computing the output variance resulting 
from quantization noise effects, one in the frequency domain and one In the time 
domain. The frequency-domain analysis method is an application of residue theory 

a2 

[62]. Given the noise source of variance and the (scaled structure) 
transfer function G^(z) from the noise source to the output node, then the output 

variance <r^ due to this noise source can be written: 


*? - T itz i rJ ) * G /<*> G /( z “ 1)2 " 1dZ (6 ’ 27) 

where J represents the square root of -1. The contour Integral (5.27) can be 
evaluated by factoring G / (z)Gf(z~ 1 )z~1 to determine lie pole locations. If of 
these poles Zj lie Inside the unit circle, then 


Residue |Gj (z )Gy (z _1 )z _1 j at Zj j (5.26) 

Since every noise source Is assumed to be uncorrelated with every other, the to- 
tal output variance will simply be the sum of all the <r K 


a£ 

12 


V( 
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tf we apply this residue method to the A/D quantization noise source, we 
•M that will depend on the filter transfer function H(z). Since H(z) Is in- 
dependent of the structure chosen, given Infinite precision coefficients, the effect 
of the A/D roundoff noise on filter output variance Is dependent only on and 

the A/D wordlength. For a compensator the effect of A/D roundoff noise on J Is 
also structure-independent, given Infinite-precision coefficients. 

The time-domain approach to analyzing roundoff effects Is presented by 
Hwang [63] for one-level state space structures and Chan [17] for the general 
multhievel case. In the context of the modified state space representation as 
presented in (3.10), the derivation proceeds as follows. Assume that the struc- 
ture has already been scaled so that the factor p described after equation (6.0) 
must be included to produce u(k) from the scaled u(k). For a filter of input y, 
scaled output u, and scaled states v, the effect of roundoff noise on the filter 
states can be described by: 


v{k+ 1) 
,u(k+ 1). 





■ ■ ■ ^ / € / _ 1 (/() + « q (A) + ^ 12 « arf (A)( 6.20) 


where «^(A) represents the noise sources due to the product quantizations asso 

elated with the precedence level matrix and *£(/(*) represents the A/D noise 

source. Recall that all such error sources are assumed to be uncorrelated. Thu3, 
the roundoff noise covariances can be written: 
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(5.30) 



where is the Internal quantization step size of the structure, Is the A/D 
quantization step size, and Ay Is a diagonal matrix whose ( /,/ entry equals the 
number of norelnteger coefficients In the J row of that Is, the number of 

roundoff error sources associated with the component of r f . This expression 

assumes that roundoff occurs after every non-trivial product, if double-precision 
adders are used as described in section 6.1, then simply replace all the non-zero 
entries of Ay in (5.30) with ones. 

To use (6.29) and (5.30) in computing the output variance, we can take ei- 
ther of the approaches used in section 5.2 for computing variances; that Is, ei- 
ther the Infinite series of (6.12) or the Lyapunov equation of (6.13) can be used. 
For the infinite-series approach, we would have to approximate the series by com- 
puting only a finite number of terms. The closer to the unit circle any of the poles 
of the system (6.30) are, the more terms will be required for an acceptable ap- 
proximation [63]. Consequently, we will use the Lyapunov equation method. 

The steady-state (scaled) state covariance matrix V can be computed by 
solving the following Lyapunov equation: 


(7-^^ (6.31) 

where 
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The output variance of u will simply be equal to the lower right-hand corner entry 
of V. Note that the above equations for roundoff analysis ore solved using 
Infinite-precision coefficients for simplicity. The insertion of the actual finlte- 
wordlength coefficients would only change the results In a minor way. (In this 
case, there will be also a slight dependence on structure for the A/D noise contri- 
bution.) The use of infinite-precision coefficients is especially- justified when one 
recalls that the selection of an internal or A/D wordlength can only be made In 
terms of whole bits. 

Now let us adapt this approach for the digital feedback compensator. 
Again, we need to consider the behavior of the closed-loop system, as done by 
Knowles and Edwards [7] and Curry [8] for sampled-data systems and Sripad 
[13]. Curry [8] has considered the second moment of the system output error 
due to rounding for a specific sampled-data control system with a direct form' II 
compensator structure. Knowles and Edwards [7] also used the additive white 
noise model for generating a bound on the quantization noise effects of direct 
form II, cascade, and parallel compensator structures. Sripad [13] considered 
the Increase In the performance index J due to roundoff, using the additive white 
noise model, but did not consider either the scaling issue or an accurate and gen- 
eral notion of a compensator structure. Our results will be more general since we 
can consider any type of compensator structure, and they will of course be 
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adapted from the digital filtering approach described above. The factors p ind 
k ad described In section 6.2 must now be now explicitly Included In the analysis 

procedure. The scaled, augmented plant/compensator system, Including roundoff 
noise sources (but not plant or measurement noises), can be written: 


j?(A+1 ) 


x(k )' 


v(A+1) 

-2 

v{k) 
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.«(A+1). 


u(k). 
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(6.32) 
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The resulting (scaled) state covariance matrix Z (due only to roundoff noise) will 
be the solution to the following Lyapunov equation: 


2«2z2' + 


0 0 
.0 0 . 


(6.33) 


The covariance matrix Z can be related to the performance index J by using the 
trace form of J , equivalent to (2.6): 

J - trac e(Qxx') + 2 trace(Af ux') + trace(ft </</') (6.34) 

■ trace T Z 

where 
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By solving (6.33) and evaluating (6.34) for the scaled system covariance matrix 
2! we can compute the increase dJ due to roundoff noise alone. Again, the 
infinite-precision coefficient values of the structure are used. 

The analysis procedure described above extends easily to multiple-input 
multiple-output structures, but as described in Chapter 8, the scaling Issue Is 
more complex. 


$6.6 Minimum Roundoff Noise Structures 

Now that an analytic technique for treating roundoff noise effects has been 
presented, both for digital filters and for digital compensators, we can describe 
minimum roundoff noise structures. (See Chapter 3.) First, we will present the 
one-level minimum roundoff noise filter structure derived by Mullis and Roberts 
[18,37,38] and Hwang [39], and then we will adapt the technique to produce a 
one-level minimum roundoff noise compensator structure. Assume that a one-level 
fflter structure has been / g scaled using (5.8M6.15), and that the roundoff noise 

could be evaluated with (6.31). (Neglect A/D noise.) For one level, (6.31) can 
be rewritten to include scaling: 

A 2 

C , -S 1 * 11 S 1 " 1 8'(S 1 r 1 t 11 'S 1 + l|-A 1 (6.86) 

Recall that Is a diagonal matrix whose j ** diagonal entry equals the number of 
roundoff error sources represented in the j th row of and that the scaling ma- 
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trfx S ^ is diagonal. The output variance due to product quantization can be ex* 
pressed with the following trace: 

*0 " trace ^ & (6.36) 

where 



By substituting V for S ^ i/(S 1 )“^ , we can rewrite (6.36) and (6.36): 




A 2 


(6.37) 


Sq ■ p 2 trace (II S .j V S 1 ) 


trace p 2 S 1 II 1/ 


“ trace (II I/) (6.38) 

Using the theory of adjoint operators [1], computing cTq via (6.37) and (6.38) Is 
exactly equivalent to solving the following adjoint Lyapunov equation and evaluat- 
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Ing the trace of (6.40): (See Appendix B) 




(6.39) 


A 2 

traoB (A / s /” 2 W 1 ) 




(6.40) 


This alternate expression for roundoff noise will be important in the development 
of . an Iterative constrained optimization technique for minimizing roundoff noise 
effects, both for filter structures (see Chan [17]) and for compensator structures 
(see Chapter 8). 

Using the expression in (6.39), and the Lyapunov equation (6.13) for K 

Mullls and Roberts [18] and Hwang [39] present a method for determining the 

structure that minimizes <r^. The matrix Is assumed to be the Identity I (for 

double-precision adders) or (/?+1)I (for the case of single-precision adders and 

n+1 coefficients). Since the /-(n+D^-term In the summation expression for <r£ in 

(6.40) Is not alterable by a similarity transform, we can ignore it for now and deal 
only with K and W , the upper nxn portions of K f and W f . Thus we must minimize 

the following sum: 


i*u w n 

/-I 


(6.41) 


If P Is an rtxn (similarity) transformation matrix, then the product KW can be 
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ahown to transform to P~^KWP, Thus the n eigenvalues of (,P~^KWP) are invari- 
ant under transformation by P. These eigenvalues are called the second-order 

filter modes pf. Mullis and Roberts [37] prove the following inequality. If K and 

W are nxn, symmetric positive-definite matrices, then 




i n 
“ S M, 

"/-I ' 


(6.42) 


An (optimal) transformation exists such that the transformed K t and W t 
(Aj ■ P~^K(P , )~ 1 , W t •* P'WP) satisfy (6.42) with equality. Thus the minimum roun- 
doff noise possible, using (n+1) 2 coefficients (in general) and quantization after 


every non-trivlal multiplication can be expressed: 


) 2 m —— 

v O'opt \z 


TF (6 ' 43) 

V ' ' 4 


assuming we know some K 1 and W 1 and can solve for the eigenvalues of KW. 

If In fact we restrict ourselves to the block optimal parallel structure [37] 
with Its (fewer) 4/>+1 coefficients, then we are constraining the transformation P 
to be block diagonal and (6.42) cannot In general be satisfied with equality. How- 
ever, (6.42) will be true for each second-order section (r>~2). Thus the minimum 
block optimal product variance can be written: 


Section 6.6: Minimum Roundoff Noise Structures 


117 . 


(6.44) 


**•}**■ « 


V 


<*0^o " 


A 2 

r 

12 


2 ( ( " t1) [' t i] B .i J , + i[“'i] nt i, nt1 *|(<‘i t <‘ 2 ) 2 + f (*a** a ) z * •■■} 


This equation in fact suggests a new result — a pairing algorithm for real poles. 
Once the modes of Kty are determined, (6.44) will be minimized by pairing modes 
•o that each pair of modes sums to approximately the same quantity as every 
other pair. In fact, (6.44) may even be lower than (6.43) due to the reduced 
number of coefficients (noise sources). 

The one-level minimum roundoff noise structure developed above can be ex- 
tended to the case of one-level compensators. Again, we can neglect the A/D 
noise contribution, which is Invariant to structural transformation. Equation (6.33) 
can be rewritten In terms of Its unsealed compensator parameters and 

as follows: 


A 2 

2 -TAT~'*ZT~ 1 A'T + -— 
12 


0 0 
0 A, 


(6.46) 


where 


and 
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By recognizing that the unsealed covariance matrix Z just equals we 

can write (5.45) In a manner similar to (5.37) to produce: 


Z - AZAf + 


A 2 

r 

12 


0 0 
0 AjSj 1 


(6.46) 


The expression for the Increase in performance index due to roundoff noise for 
the scaled system, can also be written In terms of the unsealed covariance matrix 
Z: (See (5.34)) 


dJ - trace |tz| - trace |Tr -1 zr _1 J 


(6.47) 


■ trace 


: trace 


(r-Jrr^z) 

M . 


Using an adjoint Lyapunov equation, as In (5.39) and (5.40), we can express 
(6.46) and (5.47) as follows: 


W - AIWA + T 


(6.48) 
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(6.4B) 



tf we define W ^ to be the lower right-hand (n+1)x(n+1) portion of W, then: 

dJ - trace |a 1 S ~ 2 W t J (5.60) 

This expression Is Identical to the expression In (5.40). From this point on, the 
derivation of a one-level minimum roundoff noise compensator structure Is exactly 
the same as the Mullls and Roberts and Hwang procedure discussed above (see 
(6.40M6.44)). 

Conceptually, the technique described above could be extended to multiple 
levels. However, the iterative structure optimization procedure considered in 
Chapter 8 Is far more useful for minimizing roundoff noise. 

$6.7 The F8 Example and Compensator Roundoff Noise 

This section will examine the roundoff noise and scaling associated with 
some of the structures discussed in Chapter 3 for an actual sixth-order LQG sys- 
tem. This system Is a simplified version of the longitudinal dynamics of the F8 
fighter aircraft at flight condition 12 (an altitude of 20,000 feet and a speed of 
mach 8) [64]. Longitudinal control of the aircraft is restricted to the elevator 
alone and a single measurement y formed; these simplifications make the plant 
model single-input single-output, so that all our analysis techniques directly apply. 
The actual multiple-input multiple-output model could be considered with our tech- 
niques, but certain additional issues arise as discussed in Chapter 9. 
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First-order actuator dynamics are Included In the plant model, and and a 
series Integrator is also added. Thus the configuration of figure 6-3 will only have 
one reference Input. Appendix A presents the continuous-time plant modal In de- 
tail. The sample rate (10 Hertz) is selected to be well above the highest plant 
pole frequency (12 radians/second). Thus T equals 0.1 seconds. The resulting 
discrete-time model parameters are also shown In Appendix A. 

For this plant model, the design equations of Chapter 2 were followed. The 
resulting K and Q vectors are also given in Appendix 1. All calculations were 
done in double precision (16 digits, or 64 bits) so that the system parameters and 
K and G vectors are effectively Infinite-precision quantities. The resulting perfor- 
mance Index J Is 0.00176477. This number Is then taken to be the /deal value 
of the performance index, and degradation is measured relativfe to it. • 

To five significant digits, the poles and zeros of the (Ideal) compensator 
transfer function are: 


Pole Freauencies 

Zero Freauencies 

z d1 = 0.29179 

zj , 2 = 0.68904 

z£ 3 = 0.99614 

zj , 4 * 0.99869 

z 6 ,z e s 0.73149 ± j 0.40220 

z - = 0.30119 

z 2 * 0.06728 

z o. * 0.99878 

z 4 ,Z 6 = 0.881 89 ± J 0.26766 


Figure 6-7: F8 Compensator Poles and Zeros 


Note that, unlike higher-order digital filters, there are many real poles and zeros in 
this compensator. This fact complicates the pairing issue for parallel and cascade 
structures. Note also the presence of poles and zeros very near the unit circle 
at z»+ 1; these singularities can be critical in determining an acceptable structure. 
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Before discussing the different structures tested, the structure- 
Independent A/D noise contribution will be considered. If we allow a 6% increase 
in J due to this single noise source, then the procedure outlined in (6.32)-(5.34) 
using only results In a 4.98 bit A/D wordlength. (This number does not in* 

elude the sign bif.) Typically for filtering applications, the A/D wordlength need 
not be as long as the structure’s internal wordlength; the same result appears for 
this control and estimation application, as will be seen below. 

Ten structures were evaluated in terms of their product roundoff noise 
effects on Ji the direct form n, five parallel forms including a block optimal struc- 
ture), three cascade structures, and the simple structure of equation (3.26). The 
direct form II structure (a) has been described In figure 3-6 and equation (3.20), 
and has 13 coefficients, including a single scaling multiplier. The first parallel 
structure (b) Is composed of five direct form II sections, one second-order (for 
the complex pole pair) and four first-order. Each section requires its own scaler, 
so this structure has a total of 1 7 coefficients. The next two parallel structures 
use three second-order sections, and hence the issue of how we pair the four 
real poles Into two sections must be addressed. (There are three different ways.) 
Par alter structure (c) pairs z^ with z^ 4 and z^g with z p& separating the two 

near-unit-clrcle poles, while structure (d) pairs these two poles (z^g and z^), to- 
gether (see Appendix A). Each structure will require three scalers, for a total of 
16 coefficients. 

Structures (a) through (d) are all direct form IX-based and thus require two 
precedence levels. Parallel structure (e) Is a one-level structure produced by 
commuting 'Jf ao - ^2^1 where ^2 and are from structure (c), and using the 
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result as a one-level structure (see section 3.3). This structure will still be a 
parallel combination of second-order sections; each section will be a one-level 
version of a direct form II section. This structure will have 16 coefficients, one 
more than (d) or (e). Parallel structure (f) is a minimum roundoff noise block op- 
timal structure as In equation (3.27) and uses the same pole pairing as parallel 
structures (c) and (e). 

As mentioned In Chapter 3, cascade structures involve the Issues of pair- 
ing and ordering', In addition to the pairing Issues encountered with the parallel 
structure, the zeros must be paired, and the sections must be ordered. Jackson 

[61] has described general section ordering and pairing criteria. Consider the l tfl 
second-order section: 


H/(z) 


1+e n z-'*a /2 z-2 

1 +b /1 z“ 1 +b j2 z~ z 


(6.61) 


Complex pole pairs and complex zero pairs that are nearest each other are placed 
In the same section (paired). Nearness means that we try to pair poles and zeros 
so as to minimise the peak magnitude ( L M norm) of Hj(z) for all /. As for section 


ordering, when direct form II sections are used (with scaling), the no.*e vari- 


ance of the filter output tends to be minimized by ordering the sections In terms 
of Increasing where 


lltf/ll* 


(6.62) 


(This Is not a precise result.) This guideline must be changed if the Z. w norm of 
the output is our performance gauge, or If the direct form n section Is not used. 
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Furdlf a», Jaok son does not oonafder the pairing of real poles. Dehner [ 66 ] 
and Hwswg [60] develop general sub-optimal algorithms for selecting good pairing 
and ordering, but these methods tertd to require significant computer time to figure 
out the ordering and pairing for Hlgher^order filters, and they still do not address 
the pairing of real poles. Our roundoff analysis will consider just two different 
oaaoade palrings/orderlngs. Cascade structure (g) consists of an arbitrarily- 
ohoeen arrangement of poles and zeros (see Appendix A); section 1 contains the 
cooipfax pole pair (*pgi*p g) end real zero z z ^ , section 2 contains the near unlt- 

magnftude real poles z^g and z ^ and the complex zero pair (^ 41 ^ 5)1 encl sec- 
tion 6 contains the near unit-magnitude real zeros z z £ and z z g With the real 
potaa Zp 1 and z^g. Cascade structure (h) splits the near unit-magnitude poles 

and zerae, and puts the complex pole and zero pairs together In the same section 
(IM Appendix A), Both (g) and (h) require three scalers and a total Of 16 
coefficients and four precedence levels (see (3.22) and figure 3-6). Cascade 
structure (I) has the same ordering and pairing as (g), but uses direct form I sec- 
tions as described In (3.23) and figure 3-7. Hence it has different scaling than 
(g), different scaled coefficients, and fewer scalers. 

Finally, the simple structure (J) of (3.26) Is treated since this structure (or 
a Orta- or two-level version of it) has been often used, even though this structure 
(scaled) requires an excessive 60 coefficients for the F 8 system example. 

Appendix A contains the actual modified state space representations of all 
ten of these (/g-scaled) structures. The Ideal values of these coefficients are 

presented in double precision. 

Figure 6-6 summarizes the product roundoff noise results (that is, the noise 
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structure 

levels 

N 

word! 

spa 

ingth 

dpa 

max/mln l 2 - 
scaled coefficient 

(a) direct form II 

£ 

13 

10.86 

18.26 

30060/0.12 

(b) parallel direct form II 

2 

17 

8.06 

7.46 

1.6/0.0046 

(c) parallel direct form II 

2 

16 

10.18 

0.30 

10.6/0.073 

(d) parallel direct form II 

2 

16 

14.74 

13.04 

16.7/0.0016 

(e) parallel, 1 -level version of (c) 

1 

16 

0.78 

8.00 

6.3/0.073 

(f) block optimal parallel 

1 

26 

7.88 

7.06 

1.1/0.0029 

(g) cascade direct form II 

4 

16 

16.69 

14.68 

1101/0.00062 

(h) cascade direct form II 

4 

16 

10.61 

0.47 

36/0.073 

(1) cascade direct form I 

a 

14 

16.62 

14.38 

320/0.012 

(J) simple 

3 

60 

0.01 

7.64 

1.6/0.0000003 


Figure 6*8: Roundoff Noise Results 


caused by the rounding of multiplier products, not A/D rounding) for these ten 
structures, assuming optimal / 2 scaling and not accounting for the finite 

wordlengths of the coefficients themselves. The 'levels’ column lists the number 
of precedence levels, and the 'N 1 column lists the number of coefficients including 
scalers In the structure. The roundoff noise results are presented in terms of the 
number of signal (wordlength) bits that are required to hold the Increase In J due 
to product roundoff noise to 6% of the ideal value. Again, these numbers do not 
include the sign bit. Two wordlengths are presented for each structure. The 
left-hand column (larger) corresponds to the case of roundoff after every nontrivi- 
al multiplication and single-precision adders, while the right-hand column 
corresponds to the case of double-precision adders and quantization after addi- 
tion. The last column of figure 6-8 shows the maximum and minimum magnitude of 
the scaled coefficients and Is Important In determining the coefficient wordlength. 
The wider the range of values, the more fixed-point coefficient bits will probably 
be needed to achieve a given level of performance (see Chapter 8). 

From figure 6-8 we can see that the different pole pairings associated with 
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parallel structures (o) and (d) produced results that differed by 4.6 bits. Placing 
tha near-unlt magnitude poles In different sections was quite effective. Similarly, 
of the two cascades (g) and (h), the one with these same two poles In different 
sections required 6,2 fewer bits. Clearly the palrlng/orderlng Issue is not a trivial 
question. 

Structure (b), the combination of first* and second*order parallel sections, 
with its 17 coefficients outperformed every other structure except the block op- 
timal. Even so, the extra 8 coefficients of the block optimal structure with 
second-order sections only gained 0.2 bits of performance over this structure. 
Thus, when evaluating different structures, It is important to know the block op- 
timal result (for various pairings) so that we can judge whether a suboptimal 
structure like (b) is effective enough, in this case it clearly Is. If we are con- 
strained to one level, then (e) Is probably best given its 0 fewer coefficients than 
the optimal and only 1.9 bits poorer performance, Actually, in this case one 
should check the performance of a one-level version of (b). 

As expected from the literature on digital filters, the discrete form II has a 
very poor noise performance. It is Interesting to note also that the simple struc- 
ture with Its many coefficients (and hence many noise sources) performed excel- 
lently. It is not clear whether this would be true for the simple structure in gen- 
eral. 

The second worcSfength column in figure 6-8 shows the gain possible when 
using double precision adders and fewer quantizers. Depending on the structure 
tested, a savings of from 0.6 to 1 .47 bits was realized. Whether this small sav- 
ings Is enough to Justify the higher-precision adders will depend on the particular 
application. 
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$5.8 Summary 

Briefly then, we can summarize the major points brought out in this chapter 
concerning the statistical effects of quantization noise in compensators. The pro- 
cess of scaling a digital feedback compensator requires the consideration of the 
overall closed-loop control system in which the compensator is embedded. Thus 
we had to adapt the methods developed for scaling digital filters to this problem. 
Furthermore, when applying the statistical approach to scaling to the set-point 
LQG system, we had to consider an alternate configuration for the system. For 
the analysis of roundoff noise effects In compensators, we again had to adapt the 
techniques used In digital signal processing to consider the effects of the overall 
closed-loop system. The development of minimum roundoff noise structures for 
compensators required similar adaptations. When these, methods were applied to 
a specific control system example, we were able to compare different types of 
structures in terms of their roundoff noise performance. The importance of the 
pairing and ordering issue involved with the parallel and cascade structures were 
shown to be even more complex for compensators, due to the numbers of real 
poles that are common in control system compensators. Furthermore, the default 
structure for LQG controllers, the simple form, was shown to be a poor choice of 
structure in general for the LQG compensator. 
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$0.1 Introduction 

The Implementation of a discrete-time system described by an Ideal 
Infinite-precision transfer function in finite-precision hardware involves several Im- 
portant issues. Chapter 5 has discussed the quantization noise problem, and 
Chapter 7 will present the issue of limit cycle oscillations. This chapter will con- 
sider the problem of quantizing the infinite-precision coefficients of the structure 
so that they may be stored in a finite-length fixed-point binary representation. As 
with the roundoff noise question, coefficient quantization effects are also heavily 
structure-dependent, and thus the analysis of such effects is important when 
selecting a good structure and its required coefficient wordlength* 

Approximating the coefficients of a structure with a finite number of bits 
will cause a degradation in the system’s performance as compared to the Ideal. 
Assuming that a given quantitative performance measure is provided, we can 
measure the tradeoff In the number of bits versus the degradation. Then, assum- 
ing that we specify an acceptable amount of degradation, one must determine the 
minimum number of coefficient bits needed to meet this goal, and the structure 
which has the smallest such wordlength. 

Whatever the structure, the fewest number of total coefficient bits will be 
required if we allow each coefficient to have a different wordlength. We certainly 
will not need fewer total bits after adding a constraint such as uniform 
wordlength. However, the resulting complication in the digital hardware due to 
non-uniform memory widths and restrictions on the hardware multipliers make this 
superior apportionment of coefficient bits very costly. For this reason a uniform 
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fixed-point coefficient wordlength is typically assumed. This assumption will be 
carried through in the analysis, assuming n_ fractional bits, a sign bit, and enough 

u 

Integer bits to represent the largest coefficient in the structure. We will also as- 
sume that each structure has already been scaled, since the scaling operation 
can radically change the dynamic range of the coefficients, and hence the re- 
quired wordlength. 

The remainder of this chapter is organized as follows. In section 6.2 we 
will describe different methods for selecting structures that have small required 
coefficient wordlengths, and different ways of evaluating the required coefficient 
wordlength once a structure is selected. In particular we will discuss a pole-* 
location-based qualitative method for comparing structures, a direct approach to 
wordlength evaluation, and a statistical approach to structural comparison and 
wordlength determination. We will show that the statistical method has a very Im- 
portant advantage over any other approach — It can be used as the objective 
function in an iterative structure optimization procedure (see Chapter 8). Sec- 
tions 6.3 and 6.4 describe the statistical method in detail for the LQG problem, 
while section 6.5 presents the direct evaluation procedure. Using the F8 system 
presented In Chapter 5, various coefficient wordlength results, and conclusions are 
presented In section 6.6. Finally, the joint analysis of coefficient wordlength 
effects and roundoff noise effects is addressed in section 6.7. 

$6.2 Methods of Analysis 

Given some measure of performance, there are several methods for calcu- 
lating the degradation due to coefficient quantization, so that a good structure 
and the wordlength necessary to meet some allowed degradation level may be 
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selected. Before discussing these methods, we must address one other important 
question — how are the Ideal coefficients to be quantized? The simplest and 
most common procedure Is to round the coefficients to fractional bits. Unfor- 

tunately, there Is no guarantee that this is the best method in terms of some 

specific performance metric. In fact, the optimal set of n -fractional-bit 

c 

coefficients is usually not these rounded values. This fact has given rise to 
several optimization techniques [67,68,69] for determining the best set of quan- 
tized coefficients for a given structure and wordlength. Typically these tech- 
niques start near the rounded coefficient set (in discrete coefficient space) and 
search for minima. Unfortunately, these methods can be extremely time- 
consuming, with the resulting coefficient set not necessarily that much better than 
that obtained by rounding. Consequently, we will assume that finite-wordlength 
coefficients are produced by rounding the ideal values. 

The effect of a quantized coefficient on any performance measure is essen- 
tially a sensitivity question. From a frequency-domain viewpoint, having 
coefficients of finite wordlength implies that there are only a finite number of pos- 
sible pole and zero locations In the z-plane. Thus one approach to the selection 
of a structure with minimal coefficient quantization effects could be accomplished 
by examining a graph, or grid, of these locations; the coefficient sensitivity in an 
area of high grid density would be small. Thus, the structure which had the den- 
sest grid In the area of the desired poles and zeros would be chosen. Several 
structures have been described in terms of pole location grids; for example, the 
coupled form second-order section of Rader and Gold [70] has a uniform square 
grid over the entire z-plane, while the direct form II has a non-uniform grid, den- 
sest near z=±y. Avenhaus [34], Abu-Ei-Haija, Shenoi, and Peterson [71], and 
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Agarwal and Burrus [72] have described second-order sections whose grids are 
densest near thus making them excellent for implementing iowpass filters. 

Avenhaus has also presented other sections and their respective pole location 
grids. Such a general approach to filter structure selection at least has an intui- 
tive appeal. Of course, there is no guarantee that a structure with high grid den- 
sity for the desired pole locations will necessarily be the best structure in terms 
of any other measure of performance degradation due to coefficient quantization 
effects, especially when using performance measures such as the trace of the er- 
ror covariance (for a Kalman filter), or the performance index J (for LQG systems), 
or phase margin (for a classical control system). 

Given any set of quantized coefficients, the most direct and accurate way 
to evaluate the effect of finite wordlength on performance would be to recompute 
for the quantized coefficient values the entire transfer function, performance Index 
J, phase margin, or whatever quantitative measure is appropriate. In fact, this is 
the approach taken by Sripad [13] for analyzing the effects of finite wordlength 
coefficients. While this method has the virtue of being accurate, it tells us only 
one point on the performance/wordlength tradeoff curve. The performance meas- 
ure would have to be reevaluated for each potential wordlength until the desired 
degradation level has been bracketed (bounded above and below) by wordiengths 
differing only by one bit. Then the larger of the two wordiengths would be the re- 
quired coefficient wordlength for that structure. Such a brute-force approach 
could be quite time-consuming, especially when we wish to compute the required 
number of bits for several candidate structures. 

What would be quite convenient would be to have a procedure where a sin- 
gle evaluation established the behavior of the performance/wordlength tradeoff 
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curve. The required wordlength could then be estimated easily from knowing the 
allowed degradation level. Also, since the wordlength must be integral, some ac- 
curacy can be sacrificed to gain simplicity, as tong as the required wordlength is 
not underestimated. More importantly, if the coefficient wordlength estimate is 
continuous In nature, that is, not confined to an integral number of bits, then it is 
possible to apply an optimization technique [17] to synthesize better structures. 
In this procedure, which we will describe in Chapter 8, continuous transformations 
are applied to an Initial structure. These transformations are determined by a gra- 
dient search technique based on some continuous, differentiable scalar objective 
function of the coefficients of the structure. Certainly, If our required wordlength 
Is strictly integral, it is not differentiable. 

The concept of a statistical estimate of wordlength has both the advan- 
tages mentioned above. This approach originated in the study of digital filters 
with the work of Knowles and Olcayto [73]. Avenhaus [67] applied this idea to 
the digital filter power transfer function (as a performance measure), and later 
Crochiere [32,74] used the concept with the filter transfer function magnitude 
and a wordlength-optimization procedure. 

The remainder of this section will review the basic development of the sta- 
tistical wordlength measure for digital filters [74]. Consider a general scalar 
measure of performance * that Is a function of a set of coefficients, and is con- 
tinuous and differentiable. For example, the error in the transfer function magni- 
tude at a specific frequency, the integrated squared error in the transfer function 
magnitude, and the performance index for an infinite-time-horizon LQG problem are 
acceptable measures. With a finite-precision implementation, the resulting f will 
depend on the N quantized coefficients (c^, c^, ■ ■ ■ c^) of the structure. The 
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value of f associated with any particular finite-precision structure will reflect a 
degradation in performance as compared to the ideal (infinite-precision) value f . 

Assume that this degradation df can be expanded in a Taylor’s series about the 
ideal value. Keeping only first-order terms, 


df (c -j , c 2 > 


N 

c*j) ~ 33 
N i - 1 



1 

a f 

dCj 

00 < 


( 6 . 1 ) 


where c f is the coefficient to be rounded, dc, is the error due to rounding, 


/ 


and 


a f 

9 °i 


is the first partial derivative of f evaluated at the unrounded 


coefficient values. Note that coefficients such as 3, 2, 1, and H are normally not 
affected by rounding and should not be included in the sum (6.1). 


If A is the quantization step size 2 c , the fraction represented by the 
least significant bit of the fixed-point coefficient word, then each dc^ must lie 

between ±%. Given the partial derivatives in (6.1), we could then upper bound 


the error df, producing a very pessimistic wordlength estimate: 


df < 


A 

2 


N 

2 

/■=1 



( 6 . 2 ) 


The basic idea behind statistical wordlength is to treat an ensemble of 
structures. Over this ensemble, the coefficient errors dc f can be thought of as 

uniformly-distributed zero-mean uncorrelated random variables, each of variance 

A 2 

-pj-. The error df is therefore also zero-mean with a variance: 
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For large W, the central limit theorem can be applied to justify a Gaussian 
distribution for df . Thus with a given probability, say 96%, one can determine 
the variance needed for the error df to remain within some prescribed bound. In 
other words 95 out of 100 of the structures in the ensemble will result In sys- 
tems where df remains within this bound. 

From a table of the Gaussian distribution, 

Pr [ | df 1 52 a df ] - 0.954 (6.4) 


If the quantity of interest f is constrained to lie within ±Fq (the degradation lev- 


0 


el) of the ideal f then (6.4) implies that equal This result can be 

combined with (6.3) to produce an estimate of the parameter A: 
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Qf 
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l 9c / 
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(6.5) 


Given A, the statistical wordlength can be defined to be: 


SWT. - / + log 2 


( 6 . 6 ) 


The first term in (6.6) represents the number of bits necessary to 
represent the integer portion of the coefficient word (bits to the left of the fixed 
binary point) and the second term gives the number of bits n c necessary for the 
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fractional portion of the coefficient word (bits to the right of the binary point). 
The sign bit is not included in this expression. 

In the digital filter area, Crochiere [31,32,74] presents a number of results 
comparing the statistical wordlength of structures using the transfer function mag- 
nitude as the performance measure f. Since this choice of f is frequency- 
dependent, the resulting estimate is also frequency-dependent. The final 
wordlength can be selected as the maximum of the estimates over the frequency 
range of interest. In the examples treated by Crochiere, the statistical 
wordlength estimate was 1 to 3 bits conservative as compared to the actual 
minimum number of bits necessary to just meet the transfer function error limit. In 
a related work by Chan and Rabiner [75], which considered a large number of 
finite impulse response filters and a similar statistical approach to coefficient 
wordlength, the resulting 95% confidence level estimates were also observed to 
be conservative. Crochiere [32,74] was also able to statistical wordlength 
as the basis for a filter optimization procedure quite different from the technique 
we will present in Chapter 8 (but not applicable to LQG compensators). 


$6.3 Statistical Wordlength and LQG Systems 

As mentioned in Chapters 1 and 2, it is natural to use the performance in- 
dex J of (2.3) as the measure of performance f for a steady-state LQG system. 
Using the approach of the previous section, the change in J would be estimated 
by: 


dJ{cy c 2 ,...c /v )* 
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However, the optimal nature of the LQG control problem forces all the first-order 

sensitivities ~~ to be zero. Therefore a higher-order approximation is neces- 
5 C/ 

aary: 


dJ 
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dc. dc j 
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The use of second-order terms (not used in digital filter analysis) Is a unique as- 
pect of our statistical wordlength formulation. However, the use of these terms 
would be implicit In any statistical estimate based on the error in an optimized 
scalar performance measure. If a digital filter was designed by minimizing the in- 
tegrated squared error between the desired and actual filter transfer function 
magnitude characteristic, then a statistical wordlength estimate based on this per- 
formance measure would have to use second-order sensitivities — all first-order 
sensitivities would be zero. Thus our statistical wordlength derivation could be an 
extension to the techniques of digital signal processing. However, when the 
overall filter statistical estimate Is taken to be the maximum over a set of esti- 
mates made at specific frequencies (each based on the transfer function magni- 
tude error at that frequency), then of course the first-order sensitivities for each 
of those estimates would be non-zero no matter how the original filter was 
designed. This was the case considered by Crochiere. Frequently in fact, digital 
filters are not designed by minimizing a differentiable scalar criterion. Thus one 
would have to use the. approach taken by Crochiere for developing a statistical 
wordlength estimate. 

Proceeding from (6.8), recall that all the errors dc. and dCj are assumed 
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to be uncorrelated for l*j. Thus, the mean of dJ will no longer be zero: 


E(dJ) 




(6.9) 


For convenience, define the random variable e to be the square of dCj 

and variance can be shown to be £(«) = «“ -^ and f (« 2 ) * * 2 ■ ■ 
second moment and variance of dJ can be written as follows: 
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Recall the application of the central limit theorem in section 6.2. We can 
make the same assumption for our higher-order statistical wordlength derivation. 
For the usual digital filtering estimate, the coefficient quantization could either de- 
crease or increase the error in the transfer function magnitude at any specific 
frequency. This error was zero-mean. In the control case, the value of J can 
only Increase under coefficient quantization. Thus we need only have a 
specification on the maximum allowed value of J including the degradation due to 
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coefficient quantization: J^+ffQ. Following the general approach of section 6.2, 

we must relate this value to the two*sigma point in the distribution for dJ (See 
figure 6*1): 

p(dJ) 



Figure 6-1 : Probability Density of dJ 

(6 - 12 > 


This choice of or^j gives a 97.6% confidence level in terms of remaining below 
the allowed deviation £ Q . Combining (6.11), (6.12), and the values of e and o , fl , 
we can derive an expression for A 2 : 
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Using (6.6), the SWL can be written: 

SWL-! + ± log 2 |^-j (614) 

There are several important distinctions between the statistical wordlength 
method as described in section 6.2 and the expression (6.14). First, the require- 
ment of second derivative terms has led to a fairly complex expression for the 
SWL — an efficient computational procedure will be critical. Second, since the 
performance index J is not frequency dependent, neither is the statistical 
wordlength estimate. Only one evaluation will be needed, rather than one per fre- 
quency as with the transfer function-based filter wordlength estimate. Another 
distinction involves the Gaussian assumption. The analysis in (6.12)-{6.14) ap- 
plied the central limit theorem to justify this distribution. Vet we know that the 
distribution of dJ must be one-sided ( dJ must be positive). Thus the 97.5% proba- 
bility figure may not be as accurate as the digital filter probability of 95%. How- 
ever, it is not really important whether 95 out of 100 structures have statistical 
estimates that are conservative, or 85 out of 1 00, and so forth. 

The final distinction between the usual filtering estimate described in sec- 
tion 6.2 and the LQG controller estimate is the non-zero mean degradation dJ. For 
the filtering case, the mean degradation in the transfer function is zero. For the 
LQG development, it is possible to form an estimate without taking into account 
the standard deviation of the error dJ. If we set the mean degradation value to 
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equal the allowed degradation Eg, then using (6.9): 
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From (6.16), we can write an expression for A 2 : 
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A mean statistical wordlength {MSWL) can now be defined, using (6.14) and 
(6.16): 
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The interpretation of this MSWL estimate is that one half of the structures using 
this wordlength will have more degradation than Eg, and one half will have less. 

Whether or not the MSWL is a useful estimate depends on the width of the dJ dis- 
tribution in figure 6-1 and on the change in this distribution from structure to 
structure. In other words, It will depend on the tightness in the relationship 
between the SWL and the MSWL estimates. Consequently, we will compute both 
estimates for a selection of structures. The advantage of the MSWL is clear — 
reduced complexity and hopefully significantly less time for its evaluation. 

At this point it Is convenient to mention the analysis of sub-optimal compen- 
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sators. If the sub-optimal compensator results from a parameter optimization prob- 
lem [19,20], then the first-order sensitivities will all be zero and the statistical 
wordlength approach developed in this chapter can be used. If the sub-optimal 
design Is only an approximation of an optimal design, then we can still apply this 

method. The only difference would be the inclusion of first-derivative - — terms 

oc i 

(first-order sensitivities); these terms would be non-zero since the compensator 
is not optimal, or even locally optimal as in the parameter optimization designs. 

Again, it is important to realize that the estimates derived in this section 
for LQG compensator coefficient wordlength could also be useful for digital filters, 
as long as the filter design optimizes some scaler differentiable objective function. 

$6.4 Computing the Statistical Wordlength 

As in Chapter 6, the trace form of J will be convenient for computing sta- 
tistical wordlength. Recall the following two equations from Chapter 5: 

trace (Tz) (6.18) 

Z *= AZA' + C (6.19) 

where T contains the weighting matrices Q, M, and R as in (6.34), and A and C 
are defined by: 
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|° ('* , 12 e 2'* r l2 , )J 

Assume that the structure has already been scaled, so that and contain 
the infinite-precision scaled compensator parameters. If (6.18) is evaluated with 


these infinite-precision parameters, the resulting value of the performance index 


J w will be Independent of the structure chosen. However, the partial derivatives 

of with respect to the coefficients of the structure, evaluated at the ideal 

coefficients values, will of course be structure-dependent. The second-partial 
derivative of (6.18) can be written: 


0 2 J 


dc-dc 
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Thus the partials of Z (each a matrix) must be computed efficiently. Taking the 
first derivative with respect to Cj of (6.19) produces: 


where 




( 6 . 21 ) 
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The second partials of Z can now be written: 
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where 
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Rather than solving (6.22) W 2 times (extremely time-consuming) we can apply the 
adjoint method used in Chapter 5. Equations (6.20) and (6,22) can be replaced 
by the following two equations: 

5^7 ■ tr " oe °[ x ij* x ij')~ z ,race ( 0 x v) (623) 

t/ - A’UA + T (6.24) 

where 0, 4, and T are all (2n+1)x(2/j+1) matrices, and A and T can be found in 
(6.19) and (5.34). 

Further simplification is possible when evaluating (6.23) once 0 is comput- 
ed. The matrices A and *12^2*12' can be expressed in terms of ¥ M : 
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where E ,,, is defined to be a unit element matrix of the same dimensions as . 
ftf m 

This matrix is all zero except for a single unity entry at index Similarly, If 

c j is located in ^ at index (r,s), we can write: 
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We can infer from (6.28) that if and Cy are in the same matrix ’9 fn (thus m a t), 
02? w 

then must be zero. This fact simplifies the calculation of X , , to some ex- 

0Cy 2 


tent (significant for the MSWL estimate, which only requires Xjj for i m j). Appen- 
dix C presents further details regarding the evaluation of (6.23). 

Unfortunately, the evaluation of (6.23) still requires the computation of 
equation (6.21) for all N coefficients. However, these computations can also be 
simplified. The Lyapunov solution method used in this analysis Is that of Barraud 
[76], and has several distinct parts. Given an equation like (6.19), this method 
Will: 

(1) Compute an orthogonal transformation matrix -P that converts A to upper 

Schur form (upper triangular except for the first sub-diagonal row): 

- PAP 

(2) Use P to transform C to C c : C_ * P > AP 

s s 

(3) Solve the transformed equation Z c - A_Z_4 ' + C_ by a back substitution 

5 Odd 5 

technique 

(4) Transform the result Z_. to Z via Z =PZ_P' 

s s 

The number of operations involved in each step is proportional to (2n+1)® if Z, A, 
and C are (2 /j + 1)x(2/i+ 1). However, by far the majority of the computations are 
involved in step 1 , which performs an eigenvalue-eigenvector analysis of A. Step 
3 requires (approximately) 5% to 10% of the total time, depending on the partlcu- 
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lar A matrix. The important point to realize is that step 1 need only be performed 
once for all N equations (6.21). In fact steps 2 and 4 can also be simplified by 
including the P and P> multiplications in the matrices M“\ and M2 described above 
for the Xjj terms. Using this method, there will still be a proportionality to 

Af(2/?+1) 3 in computing (6.21), but it will be many times smaller than for the full 
four-step procedure. 

In summary, the computational procedure for statistical wordlength primarily 
involves the second derivatives of J required for (6.1 0). Assuming that computa- 
tion time is dominated by the number of multiplies, the following approximate 
dependence of the computation time on the number of coefficients N and the 
(augmented) system order 2n + 1 exists: 

t SWL « A/ 2 (2n+1) 2 + A/(2n+1) 3 + (2n+ 1) 3 

For the MSWL estimate, this proportionality will be reduced: 

*MSWL K ^( 2/, + 1)^ + (2/J+ 1 ) 3 

Thus, as N increases, the MSWL estimate becomes computationally more and more 
efficient as compared with the SWL estimate. 

§6.6 Direct Wordlength Computation 

For comparison, It is important to include the direct method for determining 
the coefficient wordlength required to meet or exceed the degradation level Eq. 

Basically, this procedure will involve selecting a test wordlength, rounding the 
coefficients to that wordlength, and then forming the (finite-precision) matrices tyj, 

A, and C. Using these finite-precision parameters, the Lyapunov equation (6.19) 
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must be solved and the trace (6.18) evaluated. The resulting value of J can be 
compared to + E Q , and then a decision made whether to alter the test 

wordlength up or down. 

If the performance Index were strictly monotonic in the coefficient 
wordlength, then a binary search algorithm could be designed that would always 
succeed in finding the required wordlength. For example, starting at some large 
Initial test wordlength, one could decrement the test wordlength 1 0 bits at a time 
until the performance index exceeded the value J M +£ Q , then increment the test 

wordlength in smaller steps until the performance index was below J k + Eq, and so 

forth. However, J need not be strictly monotonic in wordlength, since the 
coefficient rounding operation is so nonlinear. However, J is roughly monotonic. 
Thus, the search procedure must try to account for possible anomalies in the 
behavior of J . One other pitfall must be avoided; if the test wordlength is so 
small that the resulting feedback system is unstable, then the computed J value 
will be meaningless. One simple way to test for this possibility would be to exam- 
ine the resulting eigenvalues, which are a by-product of the Lyapunov solution 
method of Barraud. 

The method we ha,ve used is based on the above discussion. After loosely 
bracketing the allowed degradation E Q with two test wordlengths, the lower of 

which is tested to guarantee stability, an exponential curve is fit to these two 
points. Using £ Q and this curve, a reasonable choice of a new test wordlength 

can be made. From this point, the test wordlength is stepped a bit at a time until 
the required wordlength is established. The details of the algorithm are shown 
below: 
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(1) Bracket the wordlength w with the initial values wmax -48 and wmin-O. 
Initialize the increment i at 10, and set the initial value of w near the 
value wmax. Compute the ideal J m using the double-precision coefficient 

values, and add an allowed level of degradation to produce the desired 
performance J Q . 

(2) Decrement the wordlength w by /. 

(3) Test for a negative wordlength w. If found, set w to 1. 

(4) Round the ideal coefficient values to wordlength w, and compute the result- 
ing test value J ^ of the performance index. 

(6) Test for instability by comparing J t to J 0B . If J t is smaller, the system with 

coefficients of wordlength w is unstable. Then increment w by /, halve the 
Increment size, and return to step (2). Otherwise, if J t is larger than J^, 

continue. 

(0) Test to see if J t is between and J Q . If so, set wmax to the current 

value of w and return to step (2). Otherwise, set wmin to the current 
value of w and continue. Thus we have bracketed the required wordlength 
with wmax and wmin, and know the performance levels for each of these 
wordlengths. 

(7) Using the two wordlength/performance points found in step (6), and also 
the ideal performance value J m (associated with some very large 

wordlength, say 100), fit an exponential curve to describe the perfor- 
mance index as a function of the wordlength. Interpolate to find a next 
guess at the required wordlength. Round the coefficients to this 
wordlength and compute the resulting performance index. 
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(8) If this value is greater than Jq, increase w a bit at a time until the result- 
ing performance level is below Jq. The corresponding w will be the re- 
quired wordlength. If however the performance level from step (7) is 
below Jq, decrease w a bit at a time until the resulting performance level 

is above J q. The corresponding wordlength w, plus one, will be the re- 
quired wordlength. 

The direct algorithm may be time-consuming as compares to the statistical 
method because it requires repeated solutions of the Lyapunov equation (6.10) 
until we bracket the desired performance, and no simplifications are possible from 
one solution to the next since each finite-precision A matrix is different. If an 
average of iterations are required to establish the required wordlength, then 

the dominant number of multiply operations required to compute this true 
wordlength ( TWO is proportional to: 

*TWL k ^ 

Thus, a comparison between the statistical estimates SWL and MSWL and the TWL 
described above will depend upon n, n., N, and the constants of proportionality. 

However, as the number of coefficients increases, the statistical estimates will 
become less and less efficient, while the true wordlength computation time 
remains essentially constant. Recall though, that the statistical estimate is still 
useful as the basis for a wordlength optimization procedure as discussed in 
Chapter 8. The true wordlength method could not be used for such a procedure, 
since It Is not continuous and thus not differentiable. 
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§8.0 The F8 System and the Coefficient Wordlength Issue 

The effects of finite coefficient wordlength were evaluated for the F8 sys- 
tem example and the ten structures described in Chapter 6. The results are 
presented In figure 6-2 using the following format. Column 1 lists the number of 
Integer bits required for the coefficient word; this value Is obtained from the larg- 
est scaled coefficient value (see figure 5-8). The next three columns list the sta- 
tistical estimates SWL and MSWL, and finally the true wordlength as evaluated in 
section 6.5. In each case, the execution time in seconds for each wordlength 
determination method is listed in parenthesis following each entry. These times 
are subject to some small amount of uncertainty depending on specific run-time 
conditions, so they must be regarded as approximate. Again, the wordlengths list- 
ed represent the number of coefficient bits (not including the sign) required to 
achieve at most a 5% increase In the performance index J. Finally, the last 
column of figure 6-2 lists the number of bits by which the SWL estimate exceeds 
the actual required wordlength. 
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structure 

1 

SWL 

MSWL 

TWL 

SWL-TWL 

e) 

direct form II 

16 

35.99 (0.81) 

35.05 (0.70) 

32 (1.2) 

3,99 

b) 

parallel d.f. II 

1 

6.84 (0.93) 

6.16 (0.78) 

6 (1.08) 

0.84 

c) 

parallel d.f, II 

4 

12.38 (0.87) 

11.52 (0,78) 

11 (1.26) 

1.38 

d) 

parallel d.f. II 

4 

19.02 (0.86) 

18.14 (0.77) 

13 (1.08) 

6.02 

e) 

1 -level from (c) 

3 

11.08 (0.90) 

10.22 (0.78) 

10 (1.19) 

1.08 

f) 

block optimal 

1 

7.02 (1.26) 

6.2 (0.91) 

7 (1.11) 

0.02 

g) 

cascade, d.f. II 

11 

26.25 (0.83) 

25.38 (0.72) 

21 (1.21) 

6.25 

h) 

cascade, d.f. II 

6 

14.61 (0.86) 

13.81 (0.72) 

14 (1.36) 

0.61 

1) 

cascade, d.f. I 

9 

24.25 (0.84) 

23.38 (0.71) 

20 (1.1) 

4.26 

JL 

simple 

1 

9.06 (2.44) 

8.26 (1.29) 

9 (1.71) 

0.06 


Figure 6-2: F8 Coefficient Word length Results 


A great deal of information may be drawn from figure 6-2. First, we can 
discuss the performance of the ten structures with regard to coefficient 
wdrdlength. Referring to the TWL values, we can see that the parallel structure 
(b) using first- and second-order direct form II sections, and the block optimal 
parallel structure (f) performed the best, needing only 6 and 7 bits respectively. 
Quite acceptable performance was also achieved with the simple structure (j) (9 
bits), the one-level parallel structure (e) (10 bits), and the parallel structure (c) 
(11 bits). As with the roundoff noise results of Chapter 5, the direct form II 
structure (a) performed the worst. For the two parallel and two cascade struc- 
tures using all second-order sections but with two different pole pairings, the pair- 
ing that was better for roundoff noise ((c) and (h)) was also superior for 
coefficient sensitivity — 2 bits better for the parallel case, and 7 bits better for 
the cascade. 

In fact, if we rank the structures on the basis of their required coefficient 
wordlengths (b, f, j, e, c, d, h, I, g, a) and then also on the basis of their signal 
variable wordlengths/roundoff noise performance (f, b, j, e, c, h, d, i, g, a), we can 
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see a very strong correlation. The orderings are nearly identical — only the ad- 
jacent structures b and f are Interchanged, as are h and d, The correlation 
between good roundoff noise performance and low coefficient sensitivity has been 
well-publicized for digital filter structures [1 7,40,48,77,78]. Of course, these 
results pertain to the sensitivity of the transfer function magnitude to Its 
coefficients. From our results, this correlation seems to carry directly over to the 
control compensator setting. 

One point to be cognizant of is that certain coefficients in a structure, 
when rounded to the TWL wordlength, may in fact become zero or unity, thus elim- 
inating them as multipliers. This situation occurs in the simple structure (j), reduc- 
ing the number of coefficients from 50 to 40, In the block optimal structure (f), 
reducing the number of coefficients from 25 to 24, and in the parallel structure 
(b), reducing the number of coefficients from 17 to 16. Such reduction should 
factor into the structure selection procedure. 

Taking the number of multiplies, number of precedence levels, roundoff 
noise performance, and required coefficient wordlength all into account, parallel 
structure (b), which uses first-order sections for real poles and second-order sec- 
tions for complex poles, Is probably the best choice. To achieve an overall 3% 
maximum increase in J with this structure, we could use an 8-bit A/D converter, 
8-bit coefficients, and 10-bit signal variables. (Due to the quadratic nature of J, 
each extra bit reduces the increase in J by approximately a factor of four.) 
Each of these wordlengths includes the sign bit. If circumstances required a 
one-level structure for a short sampling period T, then we would probably use the 
block optimal structure and 24 hardware multipliers. Any final decision as to 
structure selection is of course application-dependent 
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The above discussion applies to the actual wordlengths found by the direct 
method. Now let us examine how useful it would be to make the comparison of 
structures using the SWL statistical estimate. For the ten structures shown, the 
SMtt. estimate ranged from 0 to 6 bits conservative, which is quite a wide range. 
However, this situation is easily explained. Structures (d), (g), and (i) had the 
poorest estimates. Not coincidentally, all three of these structures have two par- 
ticular coefficients in common, -.9938344 and 1.9938281 (see Appendix A), and 
these two coefficients dominate in the expression for statistical coefficient 
wordlength for these examples. Removal of these two coefficients from the sta- 
tistical wordlength analysis produces estimates within one bit of the true 
wordlength. Thus these case*' represent low probability events (from the left- 
hand tail of the distribution in figure 6-1). In any case, these particular two 
coefficients resulted from pairing the two real near unit-magnitude poles, which 
has already been shown to be a poor choice with respect to finite wordlength per- 
formance. Of the ten structures, the SMAL estimate is excellent (0 to 1.1 bits 
conservative) for the five lowest coefficient wordlength structures and the cas- 
cade (h). 

As for a comparison between the SWL and MSWL estimates, the MSWL 
value was consistently .68-.94 bits below the SWL value. This tight range of 
values suggests that the distribution of dJ shown in figure 8-1 is quite narrow. 
Thus the MSWL, which is simpler to compute, may well be preferable to the SWL. 
One could compute the MSWL and then add some fixed number, say one bit, for an 
estimate. The primary advantage to using the MSWL estimate over the SWL, 
given their apparent tight correlation, would be in the constrained optimization pro- 
cedure of Chapter 8. In principle, the optimization procedure could use either sta- 
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tistical estimate for its objective function. Since the MSWL estimate is simpler to 
compute, it would be preferable to the SWL for the objective function. In Chapter 
8, this estimate will be used as the basis for finding a minimum coefficient 
wordlehgth structure. 

Figure 8-3 shows a plot of the execution times of the TWL, SWL, and MSWL 
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Figure 6-3: Execution times vs. Number of Coefficients 


routines, as run on an Amdah! 470 at the Charles Stark Draper Laboratory, versus 
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the number of coefficients in the structure. From this figure we can see that the 
TWL computation takes between 1.1 .and 1.36 seconds (since the routine could be 
written more efficiently to reduce the execution time for structure (j)). For ap- 
proximately 20 coefficients or less, the SWL estimate is somewhat faster to com- 
pute than the TWL value, about 15% to 30%, and the MSWL is at least another 
tenth-second faster than this. 

One important advantage to either the SWL or MSWL estimates is the 
second-order sensitivities they produce. Once these values are computed, it is 
easy to see which coefficients dominate, as far as the required coefficient 
wordlength is concerned. The portion of the structure in which these coefficients 
occur is then a likely candidate for optimization as described in Chapter 8. 
Specifically, the second-order section In which these coefficients occur should be 
unconstrained — in other words, it should not have a direct form II structure, but 
have a structure with more coefficients and thus more degrees of freedom. The 
optimization procedure will then exploit these extra degrees of freedom and pro- 
duce an overall structure with a lower required coefficient wordlength. 

In addition, there is a further advantage to knowing the individual sensitivi- 
ties. In figure 6-2 we can see that structures (a), (c), (d), (e), (g), (h), and (i) 
have at least one large coefficient that requires the large number of integer bits 
(more than 1) in the fixed-point coefficient word. By replacing each of these 
coefficients by a smaller-magnitude coefficient followed by a shift, we can reduce 
the number of integer bits that are required. The amount of the shift (number of 
bits) will be limited by the coefficient sensitivities. For example, structure (d) has 
only 2 coefficients larger than two (see Appendix A). Their ideal values are ap- 

8 2 i 

proxlmately 16.7 and -16.7. From the SWL analysis, their sensitivities — — are 


Section 6.6: The F8 System and the Coefficient Wordlength Issue 


156 . 


approximately 0.043. The dominant sensitivities with respect to determining the 
actual coefficient wordlength (and for the sake of this discussion we will leave 
out the coefficients 1.99383281 and -.9938344 mentioned above) are on the ord- 
er of 160. Since each factor of 2 decrease in a coefficient value results in a 
factor of 4 increase in its sensitivity (because we are taking second-order sensi- 
tivities), we can decrease these two large coefficients by a factor of 8 (three 
bits), while only Increasing their sensitivities to about 2.8. Since this is still 
Insignificant with respect to 160, the statistical (and true) fractional wordlengths 
will not increase appreciably. The net result is a savings in total wordlength of 
three bits (from 13 to 10 total bits), while adding only two simple three-bit shifts 
to the hardware. Note that such a shift operation does not involve any additional 
hardware, but just a rewiring of the respective multiplier output and the following 
quantizer or adder input. In structure (d), all we are doing is replacing a multipli- 
cation by 1 5.71 73777048272 with a multiplication by 1.9646722206034 and a 
three-bit shift (a multiplication by 8) and similarly for the other large coefficient. 
The table presented In figure 6-4 shows the reduction possible for all ten struc- 
tures (where this method applies). Structures ,(c) and (e) now rate so much 
better in terms of required wordlength that they are nearly as good as the best 
choices (b) or (f). 
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structure 

1 

TWL 

(no shifts) 

possible shift 
(bits) 

expected TWL 
(with shifts) 

(a) 

direct form II 

16 

32 

11 

21 

(b) 

parallel direct form II 

1 

6 

unnecessary 

6 

(c) 

parallel direct form II 

4 

11 

3 

6 

(d) 

parallel direct form II 

4 

13 

3 

10 

(e) 

parallel 1 -level from (c) 

3 

10 

2 

8 

(f) 

block optimal parallel 

1 

7 

unnecessary 

7 

(0) 

cascade, direct form II 

11 

21 

6 

16 

<h) 

cascade, direct form II 

6 

14 

3 • 

T1 

(0 

cascade, direct form I 

9 

20 

4 

16 

(]) 

simple 

1 

9 

unnecessary 

9 


Figure 6-4: Shifting to Reduce Coefficient Word length 


$6.7 Joint Analysis of Roundoff Noise & Coefficient Rounding Effects 

. Chapters 6 and 6 have presented analyses of roundoff noise effects and 
finite coefficient wordlength effects as if the two were completely Independent. 
Ideally, one would want to analyze the roundoff effects on a structure using Its 
actual finite wordlength coefficients. However, the structure must of course be 
scaled before the coefficient wordlength analysis can be carried out. Thus, a 
near-ideal structural selection procedure would first scale the structure, then com- 
pute its required coefficient wordlength, round the infinite-precision coefficient 
values to that wordlength, and finally compute the necessary signal variable 
wordlength via a roundoff noise analysis using the rounded coefficient values. The 
procedure we have followed differs in that the roundoff analysis is performed us- 
ing the Infinite-precision coefficient values, rather than the rounded values. This 
simplification was made for two reasons. First, the effect of using infinite- 
precision coefficients in the roundoff analysis causes only very minor changes as 
compared to using the finite wordlength coefficients (a second-order effect). 
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Second, the nature of the roundoff analysis procedure Is approximate to start 
with. We are adding just one more small approximation. Once the roundoff 
analysis procedure of Chapter 6 and the statistical coefficient wordlength determi- 
nation methods of Chapter 6 are used to select one from a group of candidate 
structures, then it would be advisable to go back and do a more careful analysis 
of the finite wordlength effects and required wordlengths for this structure. 

A more important observation is the following; we have assumed that the 
increase in J due to roundoff noise (including the A/D contribution) must be limited 
to some level, say 5% of the ideal J, and that the increment due to finite 
wordlength coefficients must also be limited to some level E q, say 5%. Thus the 

total degradation will be approximately the sum of these values, or 1 0%. There is 
no Implicit reason why the overall error budget must be split evenly between 
these two effects. In fact, once a structure is selected using the techniques 
described in Chapters 5 and 6, the respective required wordlengths can be 
modified, perhaps to convenient or more nearly equal values by apportioning the 
two error limitations differently. Such a degree of freedom should be exploited to 
help simplify the hardware by conforming to more standard wordlengths and thus 
less expensive and more available hardware components. 

§6.8 Summary 

In this chapter, we have examined the coefficient wordlength issue for digi- 
tal feedback compensators. The use of a statistical approach to the determina- 
tion of an acceptable wordlength was stressed. The common digital filtering esti- 
mate was shown to be inadequate for LQG compensators due to the optimal na- 
ture of an LQG design. Through the inclusion of second-order sensitivities in the 
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statistical formulation, we derived a statistical estimate that is appropriate to the 
LQG problem, and in fact any design problem involving the optimization of its per- 
formance criterion. As a comparison, a direct method for determining the required 
coefficient wordlength was presented, and 1 0 example structures were compared. 

Based on the results presented In section 6.6, we can conclude that the 
SltfL or MSWL estimates are not simple enough to overwhelmingly justify their use 
(instead of the TWL calculation) on a calculation-time basis alone. However, there 
are two excellent advantages for which we highly recommend their use. First, 
the resulting second-order sensitivities are an excellent guide for (1) reducing 
the required wordlength of certain structures with large coefficients (greater than 
two), and (2) discovering which sections of a structure dominate in determining 
the required wordlength (this information could be used to select which portion of 
a structure to optimize, as discussed in Chapter 8). Secondly, through the use of 
the MSWL as an objective function, we can effectively determine a constrained 
minimum coefficient wordlength structure by applying transformations as described 
in Chapter 8; Once a set of candidate structures has been compared with regard 
to their roundoff noise, coefficient wordlength effects (using the statistical esti- 
mates), precedence levels, and so forth, and a structure selected, we should 
analyze it in more detail. Specifically, it would then definitely be worthwhile to 
evaluate the TWL as a final step in determining the required coefficient 
wordlength. 
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§7.1 Introduction 

The roundoff noise analysis of Chapter 5 depends on the validity of the ad- 
ditive white noise model for roundoff quantization. However, this model is not al- 
ways valid. In particular, a digital structure can exhibit oscillations known as lim- 
it cycles. Any linear system including one or more nonlinearities can exhibit auto- 
nomous oscillations due to those nonlinearities. For digital filters or compensators, 
quantization nonlinearltles exist after each multiplication product or sum of pro- 
ducts, and overflow nonlinearities exist after each adder. In addition, both non- 
linearities operate on the Ideal A/D converter output. We can classify the result- 
ing oscillations as quantizer limit cycles or overflow limit cycles, depending on the 
type of nonlinearity that causes them. Of tiiese two types, the overflow limit cy- 
cle tends to be more disastrous in its deleterious effect on performance 
— when it occurs, it has an amplitude equal to the maximum representable digital 
signal. 

In the digital signal processing literature, there are a great number of 
results concerning limit cycles. An excellent review of this literature on limit cy- 
cles can be found In Kaiser [21], or in the finite wordlength survey articles by 
Classen, Mecklenbr&uker, and Peek [60] and Cppenheim and Weinstein [57]. Will- 
sky [16] presents a comparison of these results to the nonlinear system stability 
results known to the control and estimation field. Rather than cataloguing all the 
different results and techniques used for dealing with limit cycles in digital filters, 
our effort will be confined to only the more general approaches, since they are 
more likely to extend to the control environment. 
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Several points concerning the digital signal processing limit cycle results 
should be mentioned. First, most of these results concern zero-input limit cycles, 
oscillations that occur when there is no input driving the filter. When a non-zero 
Input is present, it is unclear just what limit cycle behavior means, since the 
response of the filter to the input can be superimposed on an oscillation, or it can 
actually eliminate the oscillation [79]. Second, most of the digital filtering limit 
cycle results are specific to a single structure, usually the second-order direct 
form II structure. Since limit cycles can only be caused by nonlinearities in the 
recursive part of a filter, these results are further specific to the pole section of 
the direct form II structure. Two general conclusions follow from the digital, filter- 
ing results. First, for avoiding quantizer limit cycles, sign-magnitude truncation is 
to be preferred over roundoff. Recall that the reverse is true when quantization 
noise minimization is considered. Second, for avoiding overflow limit cycles, the 
saturation characteristic is to be preferred over the two’s complement overflow 
characteristic. For overflow, it is Important to keep In mind that the two’s comple- 
ment characteristic requires no additional hardware — it is implicit in any addition 
using two’s complement arithmetic. Additional hardware is required to implement 
the saturation characteristic. 

As a whole, our results concerning limit cycles in digital feedback compen- 
sators are limited. However, in this chapter we will make four observations. First, 
we will point out that zero-input limit cycles always occur for control systems with 
open-loop unstable plants. Second, we will stress just how the feedback loop of a 
control system can alter -the limit cycle performance of a digital compensator. In 
fact, even If the compensator alone has no limit cycles, the feedback system of 
plant and compensator together can exhibit limit cycles. Third, for a variety of 
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reasons, we will show that the limit cycle results in digital signal processing do 
not generally apply to the control setting. Finally, we will discuss the significant 
question of whether limit cycles themselves are an issue at all for LQ6 Systems. 
At even the simplest level, no LQG system could even be thought of as zero-input, 
given the system driving and measurement noises. 

The remainder of this chapter is organized as follows. Sections 7.2 and 

7.3 will present the more general digital signal processing approaches for dealing 
with quantizer limit cycles and overflow limit cycles, respectively. Finally, section 

7.4 will consider the various aspects of the limit cycle issues as they concern di- 
gital feedback compensators. Specifically, the observations mentioned above will 
be dealt with In greater depth. 

§7.2 Quantizer Limit Cycles 

There are three basic approaches for dealing with the limit cycles caused 
by the quantization nonlinearities in a digital structure. The first of these is sim- 
ply to apply general nonexistence results, which guarantee that limit cycles do not 
occur. Many of these are so general as to apply to the overflow case as well. 
This procedure can be quite restrictive as to the types of structures and quantiz- 
ers (roundoff or sign-magnitude fruncation) that apply. The second approach is 
quite different; If we can bound the magnitude of the quantization effects (this 
bound would include limit cycle and noise effects) to some level dependent on the 
wordlength, then we need only use wordlengths long enough to make these 
effects negligible. Such analysis techniques are frequently based on Lyapunov 
theory [16]. Finally, the last procedure involves random rounding ; basically this 
refers to the technique of adding randomness at selected points in a structure to 
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break up potential limit cycles. Of course this technique tends to add noise to 
the system, requiring longer wordlengths to restore performance to desired levels. 
All three of these methods will be reviewed in this section, and their extensions 
to the LQG control problem considered. 

S7.2.1 General Nonexistence Results 

We will discuss three general nonexistence results described in the digital 
signal processing literature. The first of these is a frequency-domain criterion in- 
troduced by Claasen, Mecklenbr&uker, and Peek [80] and based on the sector na- 
ture of the quantizer and/or overflow nonlinearities. Let us divide the digital filter 
under consideration into Its linear and nonlinear portions an In figure 7-1. In 



Figure 7-1 : System Divided into Linear and Nonlinear Portions 


general, multiple nonlinearities must be considered. The signals t f (k) and i/g(k) 

will represent the input and output of the nonlinearity. The linear portion of 
the system in figure 7-1 can be described by the transfer response matrix (V(z), 
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where 


f(z)~W(z)V(z) (7.1) 

and f(z) and I /(z) are the z-transforms of t(k) and v(z) respectively. Now let us 

assume that the nonlinearity is a Hector nonlinearity; that is, it lies entirely 
within the shaded sector of figure 7-2, where m f is the sector slope. (For 

NLi(0 



Figure 7-2: Sector Nonlinearity 


roundoff quantization, m y = 2, and for sign-magnitude truncation or overflow non- 

linearities, ntf = 1.) The result derived in [80] states the following: given k Q 

nonlineorities as described above, and -a W(z) that is finite for |z | = 1, zero-input 
limit cycles of period N are absent if: 
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Re 


w { a I2*h/N) _ dl( , aonsl | J_j 


<0 


(7.2) 


for h m 0, 1 , • ■ • integer [ N/2 ] 

Furthermore, if the nonlinearities are also time-invariant, with a symmetric nonde- 
creasing characteristic, then limit cycles of period N are absent if the real part of 


I L + diag 
*0 


U m 1 


M ' ( V- dl ‘<>(^) < 73 > 


is negative definite ( < 0), for all etjj and 0j- greater than or equal to zero and 


z ^ me lZrh/N 

Equation (7.3) is more difficult to apply than (7.2) since linear programming tech- 
niques must be used to take advantage of the a and 0 parameters. However, 
(7.3) is a more useful condition, since it may prove nonexistence when (7.2) does 
not. (Note that for « - 0 = 0, conditions (7.2) and (7.3) are identical.) Unfor- 
tunately, both these reiations require multiple evaluations (one per N), not to men- 
tion the task of proving negative definiteness. We can simplify the application of 
(7.2) and (7.3) somewhat by expressing these conditions differently. Siljak [81] 
has found an efficient technique for proving the positive realness of a function 
G(z), which he has extended to the matrix G(z) case. (A real rational function 
G(z) is strictly circle positive real if it has no poles outside the unit circle, and 
the real part of G(z) is strictly positive on and outside the unit circle.) Thus, for 
one nonlinearity, we could replace the repeated evaluation of (7.2) with a test for 
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the positive realness of j — — lV(z)\. Still, this procedure Is not terribly simple, 

l m / / 


especially in the matrix case. Application of (7.2) and (7.3) to the t or 2 quan- 
tizer two-pole direct form II sections of figure 7-3, for both roundoff and slgn- 

(a) One Quantizer: (after Adder) 



(b) Two Quantizers: (before Adder) 



Figure 7-3: Direct Form-II (no zeros); 1 and 2 Quantizers 
magnitude truncation nonllnearities, shows the advantage in using sign-magnitude 
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truncation over roundoff; the range of possible a and b values for which a quan- 
tizer limit cycle cannot occur is much greater under sign-magnitude truncation 
quantization [21,60]. 

A different limit cycle nonexistence result for digital filters can be related 
to the norm of the transition matrix of one-level state space structures 
[82,83,84,85,86]. This procedure can be applied to either the quantizer or 
overflow limit cycle. Suppose we have a one-level state space digital filter struc- 
ture: 

i/(A+1)- f |/li/(A) 
u(A) = C v(A) 

where f represents all the nonlinear operations of the compensator. Note that the 
type of nonlinearity implied by (7,4) can act only on the ideal values A v(k). Thus 
quantization must occur after addition, implying double-precision adders, and simi- 
larly for the overflow nonlinearity. For two’s complement overflow (see section 
7.3), this requirement presents no difficulty; if we define Q( ) to represent the 
two’s complement nonlinearity, the following relationship is true: [82] 

Q(if 1 + + *3* “ °^i + °^2 + (7-5) 

where is the result of a multiplication. The same cannot be said of the satura- 
tion overflow characteristic, and one or two extra adder bits are required to accu* 
mutate the true sum before applying saturation. 

Let us consider the zero-input case ( y(k) = 0) for (7.4). For quantizer arid 
overflow nonlinearities, we can show that: 
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+ B y(A) 


(7.4) 




m 


IK0')H 2 *7lMI 2 for all v 


(7.6) 


Where ||t/ 1{^ refers to the Euclidean norm (i7 vf 1 . For sign-magnitude truncation 

and all common overflow characteristics, 7 would be 1 , while for roundoff y would 
be 2 , 


If we define the matrix norm of A as follows: [83] 


M Ho “ wax 
* **0 


II* i'll' 

Ill'll p 


(7.7) 


then we can write: [83] 

ii*«'ii 2 *ii*ii 2 imi 2 <™> 

Combining (7.4), (7,6), and (7.8) produces: 

||v{fc + 1)|| 2 S7ll*|| 2 Mg (7.9) 

Thus we can ensure the nonexistence of zero-input limit cycles by.th$ condition 

7 114112*1 (7.10) 

since this implies a continuously-decreasing state norm. Mills, Mullis, and Roberts 
[82] have expressed this result in a different manner for the more general case 

of Ill'll = (i/ D v ) >5 where D is a positive definite diagonal matrix, and the case of 
an overflow nonlinearity ( 7 = 1 ): overflow (and hence sign-magnitude truncation 
with double-precision adders) limit cycles will not occur if and only if D - 4' D A is 
positive-definite. (This result Is based in Lyapunov theory.) 

Based on these results, It is natural to consider structures for which the 
norm of A is small (and of course less than 1). It can be shown that a minimum 
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norm filter would be < ne for which: 


||4 II 2 “ m ® x 


{'M} 


(7.11) 


This quantity is always less than 1 for (stable) digital filters; thus such filter 
structures have no overflow oscillations, and no quantizer oscillations under sign* 
magnitude truncation. 

Barnes [84] discusses minimum norm filters composed from minimum norm 
sections of arbitrary order. However, we will restrict our attention to the more 
useful case of second-order sections. In fact, a minimum norm seGond-order sec- 
tion is identical to the Rader and Gold coupled form section mentioned in Chapter 
6. The matrix A for a coupled form section with poles at (<r ± /«) would appear as 
follows: [85] (See figure 7-4.) 


<r u 

A ** 

L-a> 


(7.12) 


The lack of limit cycles under overflow and sign-magnitude truncation for the this 
structure will not be affected by scaling [82]. 

For roundoff quantization, these norm-based results cannot be used to 
prove the nonexistence of limit cycles for the minimum norm structure unless the 
maximum filter eigenvalue is less than one half. In fact, Jackson has shown that 
roundoff limit cycles will occur for the coupled-form structure [86]. Fam and 
Barnes [85] have introduced a method for taking a filter structure whose A norm 
is greater than one half, and computing an equivalent form whose norm is less 
than one half. This technique combines recursive and nonrecursfve filter sections 
but greatly increases the number of multipliers and delays over the original struc- 
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Figure 7-4: Coupled Form (Normal) Second-Order Section 


ture. 

It should be mentioned here that these results tie directly into the results 
for wave digital filters. Fettwels and Meerkbtter [47] have shown through the 
Use of a state-norm called pseudopower that overflow limit cycles and quantiser 
limit cycles will not occur in wave digital filters using sign-magnitude truncation 
quantization and any common overflow characteristic, such as two’s complement or 
satura on. 


$7.2.2 Limit Cycle Amplitude Bounds 

One common method for dealing with quantizer limit cycles is to bound their 
amplitude, and then to choose a wordlength long enough to make this bound small. 
Many methods exist for formulating amplitude bounds on the effects of quantiza- 
tion, which of course must include limit cycle effects. A good review of these 
methods, many of which have been presented in the context of sampled-data con- 
trol systems, can be found in [00] and [87]. In the results pertaining to digital 
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filters [87,88,89], the direct form II second-order section is usually considered, or 
specifically the recursive portion of this section. Recall that only the nonlineari- 
ties in the recursive portion of a structure can give rise to limit cycles. Of 
course this simplification is not possible for a control system, since the entire 
compensator structure is involved in the feedback loop. 

We will discuss one of the more general approaches to limit cycle amplitude 
bounding. This approach involves the use of Lyapunov theory, and is considered 
for digital filters in [87] and for sampled-data control systems in [11] and [12]. 
Consider a system with the following state equation: 

x(k+1)*4 x(Ac) + B u(A) (7.13) 

where x represents the state, and u the inputs. Following Parker and Hess [87], 
the system (7.13) is bounded-input bounded-output stable If the (zero-input) sys- 
tem 

*(A+1)=/I x(k) (7.14) 

Is asymptotically stable in the large. If so, a Lyapunov function x* P x exists 
where P is the symmetric positive-definite solution to the equation: 

P = A' P A + C (7.15) 

for any symmetric positive-definite matrix C. If the input to the system (7.15) is 
upper bounded by some constant k, then an upper bound on the norm of the state 
vector x can be derived [11,12]. This bound, which again will include all the 
efFects of quantization, is fairly complex to compute, Is a function of A, B, P, and 
the eigenvalues of the C and P matrices, and will be directly proportional to k. 

The procedure outlined above can be easily applied to digital filters [87] 
with one or more precedence levels. For the roundoff nonlinearity, we know that 
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•very roundoff quantization error is bounded by (A for sign-magnitude trunca- 

tlon). We can simply define these quantizer errors as inputs to the filter system, 
and then compute an upper bound on the filter state norm that is proportional to 
A. The difficulty that arises in using this bound is in selecting a Lyapunov func- 
tion, or equivalently, in selecting C [87]. Consequently, this bound can be quite 
loose, especially for certain combinations of filter parameters [87]. 

Other methods of computing limit cycle amplitude bounds either are even 
less tight then the Lyapunov-based bound ([9,10]), or are not easily extendible to 
the control system setting (such as the effective value method of Jackson [88]), 
or are even more difficult to compute (such as the matrix method of Parker and 
Hess [87]). 

$7.2.3 Random-Rounding Techniques For Limit Cycle Quenching 

The previous two sections have described two different ways for dealing 
with limit cycles. The first involved using structures for which limit cycles could 
be proven not to exist. The second involved the use of sufficient signal bits to 
bound the limit cycle amplitude to a negligible level. A third method exists - elim- 
inating limit cycles when they do occur (presumably determined by simulating the 
structure). The idea behind this procedure is that limit cycles (which represent a 
correlated quantizer error effect), can be broken up, or decor related by introduc- 
ing some randomness into the quantization procedure. This procedure results in 
the replacement of a periodic limit cycle by an aperiodic sequence of reduced 
power [90], Justification for this method can be found in Kieburtz [79], who re- 
ported limit cycle breakup as the level of a random input signal was raised. Furth- 
er Intuition for the technique can be presumed from the success enjoyed by dlth- 
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er techniques for the stabilization of unstable nonlinear systems. [91,92] 

Specific results concerning the use of randomized quantization methods ex- 
ist only for the case of the direct form II second-order section. The first method 
involves randomly switching between roundoff quantization and sign-magnitude 
truncation. By utilizing roundoff most of the time, its low-noise advantages can still 
be maintained, while the occasional use of sign-magnitude truncation will give us 
the reduced number of limit cycles common to this type of quantizer. Kieburtz, 
Lawrence and Mina [90] outline this method and present specific examples of its 
use. Unfortunately, such a technique cannot be guaranteed to eliminate all DC 
«md half-rate limit cycles (limit cycles with a two-sample period). However, 
Lawrence and Mina [93] do describe some additional constraints that can be ad- 
ded to prevent such limit cycles. 

BQttner [94] has taken a different approach to implementing random quanti- 
zation. In his approach, a random signal is injected at one point in the. direct form 
II structure to break up any possible limit cycle. One obvious difference with this 
approach is that, with no input to the filter, there will still be a noise output. In a 
control system, already driven by noise, this additional noise would probably be 
insignificant. Specifically, BQttner describes two possible approaches; first, in the 
direct form II section with only one quantizer (after a double-precision addition), 
simply replace the least significant bit of the quantized sum with a random bit. 
This procedure produces 4 times the output noise power as compared to rounding, 
since the error introduced can be anywhere between ±A, but has the advantage 
of eliminating all possible limit cycles. The second approach introduces a random 
least significant bit in one of the products input to the double-precision adder. 
Although this generates approximately half the noise generated by the flrat 
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method, it will not prevent the occurrence of limit cycles unless the input to the 
second-order section Is aperiodic and non-constant. BQttner then recommends us- 
ing a cascade of second-order sections, with the first approach used to suppress 
all limit cycles in the first section, and the second lower-noise approach used in 
all remaining sections. Since the input to these sections must contain the random 
output component generated by the first section, the second method will be 
sufficient to suppress all limit cycles in these sections. Examples were presented 
comparing this random rounding approach to the use of sign-magnitude truncation 
to eliminate limit cycles, and also to the use of roundoff quantization with longer 
wordlengths to reduce limit cycle amplitude. Again, all these results were generat- 
ed only for structures composed of direct form II second-order sections. 

§7.3 Overflow Limit Cycles 

In this section, we will examine the results specific to overflow limit cycles. 
Overflow limit cycles are particularly important because they have maximal ampli- 
tude — thus, of course, bounding techniques do not apply. In general, there are 
two overflow characteristics of particular interest, saturation (figure 7-Sa) and 
two’s complement (figure 7-5b). A two’s complement overflow characteristic is the 
natural overflow characteristic resulting when using two’s complement addition. 
No additional hardware is necessary to realize this overflow nonlinearity. The sa- 
turation overflow nonlinearity, which does require some hardware, is less prone to 
causing overflow limit cycles than the two’s complement characteristic. 

Two separate issues concerning overflow have been discussed in the digi- 
tal signal processing literature, the prevention of zero-input overflow limit cycles, 
and forced-response stability. Stability of the forced response means that the 
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Figure 7-5: Common Overflow Characteristics 

filter must recover from an overflow, that is, return asymptotically to the state 
values that would have occured if no overflow nonlinearity had been present. 

General results concerning zero-input overflow limit cycles can be inferred 
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from the discussion in section 7,2.1 on the frequency-domain criterion of Claasen, 
Meckfenbrftuker, and Peek for the saturation nonlinearity (using - 1 ). Using the 

norm-based method of Barnes and Fam, or Mullis and Roberts, we can generate 
nonexistence results that would apply to any common overflow characteristic. 

More specific results exist for the second-order direct form n section 



Figure 7-6: Direct Form II with Overflow Nonlinearity 


shown in figure 7-6 and for structures composed of such sections. Willson [95] 
and Ebert, Mazo, and Taylor [96] have found regions in the a, b parameter plane 
where overflow limit cycles will not occur with two’s complement overflow, and 
have shown that no limit cycle can occur when using the saturation overflow 
characteristic for any (stable) values a, b. In general the saturation characteristic 
Is to be preferred over the two’s complement characteristic so far as overflow 
limit cycles are concerned. However, it does require extra hardware components 
to implement the saturation overflow characteristic. Thus we would test the gen- 
eral conditions In section 7.2.1 to see whether or not the use of two’s comple- 
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merit overflow could cause oscillations. The use of the saturation overflow 


characteristic, with its additional hardware, would be advised whenever the gen- 
eral criteria of section 7.2.1 did not succeed in guaranteeing the absence of limit 
cycles for the two’s complement characteristic. 

Recovery from overflow can be determined by the following general result 
also derived by Claasen, MecklenbrSuker, and Peek [97]: if a system has no 
zero-input overflow limit cycles for all time-varying nonlinearities satisfying: 

-m * £ — — ~ <; 1 forx*0, and m,>0 for all k (7.16) 

J x I 

where 0() is the overflow nonlinearity, and this condition could possibly be tested 

using the general criteria described in section 7.2.1, then the forced response 

wfill be stable for all overflow nonlinearities satisfying (see the shaded portion of 

figure 7-7): ' 

* 

1 + ntj - irij x < 0(x)£ 1 forx:>1 

-1 ~/ 7 jy -niy x >0(*)£-1 forx<-1 (7.17) 

This result means that a system with no zero-input overflow limit cycles for all 
overflow characteristics satisfying (7.16) for nrtj- 1 (such as the wave digital 

filter) will be forced-response stable for characteristics satisfying (7.T7). Satura- 
tion satisfies (7.17), but two’s complement overflow does' not. Again, this result 

• * 

demonstrates the general advantage of saturation over two’s complement overflow 
so far as limit cycles are concerned. ‘ ' 

Beyond the general result of (7.16) and (7.17), there also exist specific 
results concerning forced-response stability for the direct form II second-order 
section of figure 7-6 [98,99], 
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$7.4 Digital Feedback Compensator Limit Cycles 

In this section we will consider the limit cycle issue as it relates to digital 
feedback compensators. Several important observations can be made. First, any 
digital control system with an open-loop unstable plant must exhibit quantizer limit 
cycles. Recall that the plant output is sampled, digitized, and quantized at the 
compensator input. This means that any output magnitude below the smallest 
quantization level Is effectively ignored by the compensator. If the open-loop 
plant |s unstable, the output will tend to increase in magnitude until it reaches the 
lowest quantization level, and some control action can occur to drive it back to- 
wards zero. However the process will then repeat. The net result is a form of 
low-amplitude limit cycle in the output of the system. Such a limit cycle will occur 
no matter what the transfer functions of the plant and compensator are, as long 
as a right-half plane pole exists, although these parameters will certainly affect 
the amplitude and frequency of the limit cycle. A proper choice of A/D wordlength 
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would keep this amplitude at the system noise level, so that it could essentially 
be ignored. One other implication of the presence of this limit cycle is that no 
general digital filtering limit cycle nonexistence result can succeed in proving limit 
cycle nonexistence for digital control systems with unstable open-loop plants. 
Furthermore, even systems with open-loop plants that have poles at s n 0 can exhi- 
bit low-amplitude limit cycles if any DC offset exists in the output of the D/A con- 
verter. 

One of the key points relating to compensator limit cycles is the overall 
effect of the closed loop on the limit cycle behavior of the compensator. For ex- 
ample, consider the digital compensator as a stand-alone digital network. Any limit 
cycles that this open-loop compensator may exhibit are strictly dependent on the 
nonlinearities in the recursive sections of the compensator. However, when the 
compensator is embedded in the feedback loop, ail the nonlinearities are part of a 
recursive portion of the control system, and thus are all Involved In determining 
limit cycle behavior. Thus, compensator limit cycles that would occur for the 
open-loop situation will be altered when the loop is closed. By the same reason- 
ing, even if the open-loop compensator would not exhibit limit cycles, the overall 
feedback system of plant and compensator together may exhibit limit cycles. As 
an example, consider the simple control system in figure 7-8. Any finite impulse 
response open-loop compensator or filter is non-recursive. Therefore it can have 
no limit cycles. However, when we embed such a filter in a closed-loop stable 
control system as in figure 7-8, limit cycles may occur. For the example above, 
let us measure signal amplitude in units of A, the quantization step size, defined in 
Chapter 5. With either roundoff or sign-magnitude truncation quantization, the out- 
put y can exhibit the following half-rate limit cycle: 
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Figure 7-8: Control System with Finite Impulse Response Compensator 
+ 10 , - 10 , + 10 , - 10 , . . . 

A related limit cycle result specific to feedback systems has been reported 
by Fettweis and MeerkOtter [100]. Motivated by the presence of digital filters in 
looped telegraph systems, they have shown the following. For a finite impulse 
response or wave digital filter embedded in a feedback loop, no quantizer limit cy- 
cles can occur If sign-magnitude truncation is used for all quantization operations 
including the A/D and: 

max |W 1 (z)| max | //»(./ u)| < 1 (7.18) 

|z|=1 0 ) 

where is the transfer function of the digital network embedded in the loop, 
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and WgO'w) is the transfer function of the open-loop plant. This result is quite 

similar to the small loop-gain theorem known to control theorists [60], As with the 
digital filtering results, the above condition points out the advantage of sign- 
magnitude truncation over roundoff quantization so far as limit cycles are con- 
cerned. Unfortunately, for control systems in general, the condition (7.18) is very 
restrictive in terms of the types of plants one could consider. Certainly any sys- 
tem whose plant had an integrator pole or even a strong resonance would not 
satisfy (7.18). However, this is the only real result in the literature for quantizer 
limit cycles in feedback systems. 

Another important observation is that the techniques for dealing with limit 
cycles in digital filters do not tend to work for control compensators. As shown 
above, none of the nonexistence techniques can be extended to consider open- 
loop unstable plants. Now let us consider control systems whose plants have in* 
tegrator poles. As a simple example, consider a double-integrator plant: 





(7.19) 


If we discretize this system at a sampling rate of 1 Hertz, and design a first-order 
compensator, the three-quantizer configuration of figure 7-9 results. To apply the 
results of Ciaasen, Mecklenbr&uker, and Peek discussed in section 7.2.1, we must 
first compute the matrix W(z). Defining e and v as shown in figure 7-9, W(z ) will 

be: 
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l V(z) = 


f 


\ 


z^CS+z" -1 ) 

d-z" 1 ) 2 


0 


a 1 

0 


a 4 

0 




(7.20) 


Unfortunately, the (2,1) entry of W (z) is not finite on the entire unit circle, and 
thus the results of (7.2) and (7.3) cannot be applied. This will be true for any 
system whose plant has an integrator pole. One possible method for handling this 
problem would be to replace the z=1 poles in the W (z) matrix with poles at 
z=1-« where «>0. Then we could evaluate (7.2) or (7.3) in the limit as e->0. How- 
ever this evaluation, or the application of the positive real test of Siljak, will be 
even more complex to compute. Note that if a discretized plant has all its poles 
entirely within the unit circle, then the Claasen, Mecklenbrauker, and Peek results 
may be used directly. 

Now let us attempt to apply the general norm-based results of section 
7.2.1. To account for the behavior of the entire closed-loop system the vector 
i/(Ac) In (7.4) would have to include both the plant and compensator states. Fol- 
lowing the analysis of (7.6) through (7.10), this would involve the evaluation of 
the norm of the closed-loop system matrix analogous to the matrix A in (7.4), and 
the assumption that the nonlinearity f operates on the entire vector Am. For the 
compensator case this would be a very restrictive assumption, since in fact the 
nonlinearity only operates on the compensator states. Furthermore, the norm- 
based analysis applies only to one-level structures. Also, the main advantage to 
the norm-based technique, namely the derivation of minimum norm structures, can- 
not be applied to compensator structures; it would involve transforming the 
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closed-loop system matrix A. However, this matrix is highly constrained given the 
control system configuration of plant and compensator. Thus A can not be subject 
to Arbitrary transformations. 

The Lyapupov-based bound discussed in section 7.2.2 has been actually 
used for control applications [11,12], and could even be used for open-loop un- 
stable plants. In the analysis of section 7.2.2, let us consider the performance of 
the entire closed-loop system. The vector x in (7.13) and (7.14) would have to 
be replaced by a vector including all the plant and compensator states as men- 
tioned above. Of course, in the LQG case, we are not interested in bounding the 
norm of x, but the more general performance index-related norm j|x'Tx||, where T 
Is defined in (5,34). However, since T is a symmetric positive-definite matrix, it 
can be factored into the product T - T'T. Thus we can define a new x to be Tx, 
similarity transform A and B, and proceed as outlined in (7.13)-(7.1 6). The result- 
ing bound will be just as loose as for the filtering case; the difficulty will still be 
In selecting the Lyapunov function. 

The final point we would like to make concerns the general question of limit 
cycles in control systems. No LQG control system is actually zero-input in nature; 
there is always system noise present. According to the results of BQttner dis- 
cussed in section 7.2.3, it is likely that this noise will quench autonomous oscilla- 
tions If the noise level is large enough. Thus limit cycle oscillations themselves 
may not be an Issue in most control systems. However, there are other effects 
caused by the nonlinear quantization operations in a compensator. First, jump 
discontinuities may occur. In such a case, small changes in the input signal lead 
to le 5 Jumps in the output [16]. Furthermore, we have not even considered the 
effects of !£he correlated noise that results from the presence of quantization non- 
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linearities. Even if limit cycles do not occur, the presence of correlated noise in 
control systems can significantly deteriorate performance. Recall that LOG sys- 
tems are designed with the assumption that the system noises are white. This 
whole area is largely unexplored for digital control systems. 
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Chapter 8: The Optimization of Structures 


$8.1 Introduction 

Techniques for the optimization of structures with respect to some scalar 
objective function are very important for the synthesis of compensator structures. 
Typically this objective function would involve either the increase in the perfor- 
mance Index due to roundoff noise, or some measure of coefficient sensitivity such 
as the SWL or MSWL, or perhaps a weighted combination of the two. In such a 
technique, it is important to have control over the number of multipliers and delay 
elements In the optimized structure, since these parameters are critical in deter- 
mining the complexity of the hardware. 

As shown in Chapter 3, any structure can be transformed to a new 
(Infinite-precision equivalent) structure through the use of a set of transformation 
matrices. In the context of the modified state space appropriate to controllers, If 
we have some scaled structure with parameters * * ‘ then we can 

transform this structure to one with parameters ^ 1 „, • ■ ■ by: 

*/ -**/*/ ( p /-i) _1 for / = 1 , (8.1) 

where the P f for / * 1, • • • , q-1 are general non-singular transformation ma- 
trices, and 


P 0 0 
0 1 0 
0 0 1 


P* 

<? 


P 0 
0 1 


The presence of unity entries in the matrices P Q and P are necessary so that 
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the actual Input and output nodes of the original structure are not altered by the 
transformation process. One consequence of this restriction is that the output 
node scaling parameter p described in section 5.2 will be invariant to such 
transformations. 

I 

Once we have computed (8.1), the new structure will have to be rescaled 
so that it satisfies the same dynamic range constraints as the original un- 
transformed structure. This overall technique will result in a new structure with 
the same number of delay elements as the original. However, If the matrices P. 

are completely general, the number of coefficient multiplies (non-unity and non-zero 
entries in the matrices 4^) will be very large. Thus it 's necessary to constrain 

the P t matrices in order to gain control over the resulting number of coefficient 

multiplies. 

Chan [17,101] has presented such a constrained optimization technique for 
digital filters, using a notation appropriate for describing digital filter structures. 
In section 8.2 we will present the steps involved in this constrained optimization 
technique for a general objective function, but in the context of the modified 
state space representation appropriate to digital feedback compensators (see 
Chapter 3). In section 8.3 we will adapt the technique of Chan for the minimiza- 
tion of roundoff noise effects in compensators, and apply the technique to a 
specific example. In section 8.4, we will use the MSWL estimate presented in 
Chapter 6 to adapt Chan’s general technique to the minimization of coefficient 
rounding effects in compensator. No specific example will be presented. Finally, 
in section 8.5 we will discuss methods for selecting which entries in the original 
matrices are to be constrained (held constant), and which are to be varied, 
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presumably becoming non-zero and non-unity. This last section represents an im- 
portant extension to the work of Chan, since it applies equally well to digital com- 
pensators and to digital filters. 

$8.2 The General Constrained Optimization Technique of Chan 

The optimization technique of Chan is based on the following observation 
[17,101] (here considered in the context of the modified state space representa- 
tion). Consider the differential equation (8.2): 

d9.it) 

— ^ forlS/Sq (8.2) 

where the matrices Gj are of appropriate dimension. Any solution 
{* ^(t), • ■ ■ , ¥ (t)} at any t will represent a structure (infinite-precision) 
equivalent to {^(O), • • ■ , ¥^(0)} if: 


Git) 

0 

O' 

[g(0 o' 

0 

1 

0 

V°" 0 1 

■ 0 

0 

1 . 



where G(t) is arbitrary. The solution to (8.2) has the form: 

♦ / (f)-P / (f)-« , / (0) (/» / _ 1 )" 1 («) (8.8) 

where 

dP.it) 

— 37 — -G.(f )/>/(«) forO <Li£q (8.4) 

dt t 1 

and the initial condition Py(0) matrices are identities. Starting with an initial 
structure which we will assume to be scaled, the technique basically integrates 
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(8.4) to obtain new transformed structures. The G j matrices are selected to 

cause an overall reduction in some objective function. Constraining any particular 
coefficient in a j matrix to be constant can be easily accomplished by holding 


Its derivative in (8.2) to zero, which implies constraints on G. and P 

Now let us present this procedure in detail. Define p to be the operation 
that forms a vector from a matrix by stacking its columns: 


column 1 
column 2 


«'(¥,)“ 


last column 


( 8 . 6 ) 


Using this operator, let us define ^(O and git) to be vector's composed of all the 
elements of {^(0, • • • , ^(t)} and {G(f),G .,(0, • ■ * , G q (t)}: 


fit) 


W^O)] 


X* q U)) 


git) 


KG^t)) 




( 8 . 6 ) 


We can now express as a linear function of ^(f) and git) using (8.2) and 

at 

( 8 . 6 ): 


a^it) 

dt 


Fit) git) 


(8.7) 


where the large matrix Fit) is a function of the elements of fit). If we wish to 
hold the component of fixed, then we must simply set the component 
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of 


to zero. Thus the dot product of the row of F (f) and the vector 

9 (f) equals zero. If several components of ^(t) are constrained, then let us 
stack up all the corresponding rows of F(t) to form a matrix R Q (t). Since the ma- 
trix product of #q (0 and g(t) is a zero vector, we can say that g(t) lies in the 
null space of R Q (t). Thus during the optimization procedure, the vector g(t) must 


be constrained to lie in this null space, which is a function of the elements of 
^r(t). Chan points out that a nontrivial g(t) satisfying this constraint condition will 
exist if the number of ^ entries held constant is less than the dimension of g(t). 

The next step in the optimization procedure is tp express the derivative of 
the objective function f (t) in terms of flf(0. Using the chain rule, and (8.4): 


dl 

at 


<7 

2 

/-I w 'l 


df 

dP, 


df 


d 

/ 2 i dP t 
/= 1 / 


dt 

Gy («)/»/(«) 


( 8 . 8 ) 


Now, by stacking the elements of the G. matrices as in ( 8 . 6 ), we can define the 
gradient vector (asa linear function of g(t): 


= $'(*) 9 (f) (8.9) 

at 

We would like to select the vector g(t) in the negative fc(t) direction, so that — 

will be as negative as possible. However, keep in mind that g(t) must also satis- 
fy the null space constraint described above. Thus, if we choose g(t) to be a 
unit magnitude vector Indicating the direction in which the optimization should 
proceed while satisfying the constraint, then: 
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e(<) ‘ lit „ «)|| 

where E^CO is the projection of ((0 onto the null space of Rq. As explained in 
Chan [17], $ R (0 can be found by computing: 

E ft (0 = X (X' X) _1 X' E(0 (8.11) 

where X is a matrix formed from a set of column vectors which form a basis for 
the null space of R q. 

In order to create an algorithm that will implement the optimization pro- 
cedure as described above, we must divide the continuous parameter t (call it 
’time’) into discrete steps of length h. Thus the optimization algorithm will involve 
a series of computations that produces a new transformed structure at time t+h 
from the transformed structure at time t. This process can be repeated until the 
value f(t+h) of the objective function for the new structure is as small as we 
like, or until no further significant improvement seems likely. 

So for a given structure at time t, we can perform all the computations in- 
volved in (8.6)-(8.1 1). The resulting vector g(t) is used to update the transfor- 
mation matrices by integrating (8.4). Chan uses the simple Euler Integration for- 
mula to form a tentative P f - for the next time instant t+h : 

P t (t +/? ) - Pj ( t) + h G. (t ) P. (t ) for 0 £/ <.q (8.12) 

where h is the integration step size. The reason that this choice is only tentative 
is that the new structure formed with the transformations P^(t+/?) would not in 

general satisfy the scaling constraints of the original structure. We must include 


Section 8.2: The General Constrained Optimization Technique of Chan 


191. 


some scaling operation in order that the structure resulting from the transforma- 
tions Pf(.t*h) is also scaled as desired. Recall from sections 6.2 and 5.3 that / g 

scaling Involves the diagonal transformation matrices S f whose elements are the 
reciprocal square roots of the diagonal elements of a set of matrices K f . In fact, 
the matrices Ky(t+/i) for the new transformed structure can be related to the ma- 
trices ACy(O) of the original scaled structure by: 

K l (t+h)~P j (t+h)K f (0)P j \t+h) forls/ig (8.13) 

Note that the diagonal elements of Ky( 0) are all unity since we have assumed our 

original structure to be scaled. Using (8.13), we can describe the / scaling 
transformations (5.9) that would have to be applied to the structure resulting from 

the transformations P f in order to scale it. In particular, the j th diagonal element 

of Sy would be the reciprocal square root of the J ^ diagonal element of Ky. The 
diagonal transformation matrices Sf can be combined with the tentative transfor- 
mation matrices Py to form the scaled transformation matrices Py (f +/ j): 

Py(t+/j) = Sy(t+/?)Py(t+/j) (8.14) 

Thus the structure formed by transforming with the matrices Pj above will have 

corresponding Ky matrices whose diagonal elements are all unity. 

Using the transformations in (8.14), we can compute the new modified 
state space matrices ¥y (<+/>) with (8.3). Note that the ¥. matrices of the new 

structure are always computed using by applying the updated transformations of 
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(8.1 4) to the matrices of the original structure. In other words, the structure 
Is not formed by updating the ^ matrices of the previous time step. This method 

was used to keep the effects of numerical inaccuracy to a minimum. Even with 
the method currently in use, we must consider the fact that the Euler integration 
cf (8.12) is only an approximation to (8.4). Thus, after computing the new ^ ma- 
trices, we must check that the constrained entries in each matrix have not 
changed, that is, we must check to determine whether the errors in the con- 
strained entries are less than some preset tolerance. If these errors were too 
large, then one approach would be to halve the step size h and repeat the pro- 
cedure starting with the computation of the tentative transformation updates P ^ in 

(8.12). If in fact the errors are small enough, then we should reevaluate the ob- 
jective function f(t+h). If the resulting value is not smaller than at time t, and it 
need not be due to numerical errors, then we should again use the approach of 
reducing the step size h and repeating the computations starting with the same 
updates in (8.12). If the objective function did turn out to be smaller than the 
value at time t, then the optimization procedure can continue for the next time 
t*2h, starting with the original formation of the vector ^(t) in (8.6). 

The overall algorithm can be summarized as follows: 

(1) Initialize the procedure with ¥ 2 , • • • , ¥ as ¥.j(0), ¥ 2 (0), * ‘ ‘ » 

^(0) and compute K.(0) as described in Chapter 5. Evaluate the objec- 
tive function, and set all P y (0) to be identity matrices. Initialize h to 1 . 

(2) Determine the matrix F and the constraint submatrix Rq as defined in (8.6) 
- (tij. 
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(3) Find a set of basis (column) vectors x,- for the null space of Rq and form 
them into the matrix X : 

t 

[x 1 X 2 X a • • ■ ] (8.16) 

(4) Express the derivative of the objective function as a function of g(t), that 
is, find £(t) as defined in (8.9). Find Its projection onto the range space of 
X using (8.11). 

(6) Evaluate g(t) using (8.10). 

(6) Compute a tentative set of matrices P^i+h) by Euler integration (8.12), 
and evaluate the corresponding K ; . matrices in (8.13). 

(7) Scale the Pj(t+h) matrices using (8.14) and evaluate the new (scaled) 
modified state space matrices "9^(t+h). 

(8) Check for errors in the constrained coefficients of ¥^(t)> If any, halve h 
and return to step 6. 

(9) Recompute the objective function f. If it has increased, halve h and re- 
turn to step 6. Otherwise, return to step 2 unless no further improvement 
Is desired. 

$8.3 The Minimization of Roundoff Noise Effects in Compensators 

Chan [17,101] applied the general procedure outlined in section 8.2 to the 
constrained optimization of filter structures for minimimum output roundoff noise 
variance. In this section we will adapt this technique to the constrained optimiza- 
tion of compensator structures for minimum roundoff noise effects. In particular, 
we will minimize the Increase in the performance index J due to roundoff quantiza- 
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tion noise. In fact, part of our adaptation can also be applied to generalize the 
technique of Chan for digital filters. 

To apply the general technique described in section 8.2 we must specify 
an objective function f, and also express the derivative of f(t) in (8.9) as a 
function of g(t), or in other words, compute £(t j Chan has used an approach 
similar to that described in section 5.6 to form an objective function. Thus the 
output noise variance was expressed as a function of the matrices Kj, and 

Ay, which were discussed in section 5.6 for one-level structures. Recall that the 

these matrices can be found by solving two Lyapunov equations of the same ord- 
er as the number of unit delays in the filter structure. Thus Chan essentially ex- 
tended the roundoff noise expression derived by Mullis and Roberts and Hwang to 
apply to multiple-level filter structures. Chan was then able to define an objec- 
tive function, and derive an expression for its derivative as necessary |q (8,0)- 

In this section we will adapt Chan’s roundoff noise expression for the digi- 
tal compensator case. Specifically, we will use the context of the modified state 
space representation, account for the performance of the entire closed-loop sys- 
tem, and also specify the objective function to reflect the increase in the perfor- 
mance index J. Thus we will be extending the expression we derived in section 
6.6 to the case of multiple-level compensator structures (see (6.44K5.49)). We 
Will also show that the expression derived by Chan for the derivative of f applies 
almost unchanged to the compensator case. 

We can extend (5.50) to include multiple precedence levels as follows. 
Excluding A/D noise, we can rewrite equation (5.33) as: (Tildes represent the 
quantities of the scaled system.) 
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( 8 . 1 6 ) 


f (V *»< »v, 

Z “A Z A' + 


A 2 

r 

12 . 


0 0 

0 n 


where 4 Is defined in (5.32) and 

0 - A + ¥ A ."-A- Jflr ' 

<7 q <M q <7 </-l <7~2 <7“1 <7 


+ ... + 'jf . . . A (p •...$ • 

V 2 * 1*2 ^ 


Recall that A ; . is a diagonal matrix whose j th diagonal element represents the 


number of roundoff noise sources associated with the row of '3^, A f is the 

quantization step size of the quantizers in the structure, A contains the parame- 
ters of the closed-loop system, and Z is the steady-state covariance matrix of 
the plant and compensator states. Also note that the parameter k^ will have no 

effect on the optimization procedure described below, or on the procedure to be 
described in section 8.4. Thus it can be set to 1 If desired. 

If we replace Z with T Z 7 “^ as in (5.46), where T is the scaling transfor- 
mation matrix that relates the original unsealed system of plant and compensator 
to the scaled system: 




1 „ 0 

0 s * 


then (8.16) can be rewritten: 


(8.17) 


Z 


A Z /T + 


A 2 To o' 

~\2 LO ft. 


where A is given in (5.45) and ft is given below: 


(8.18) 
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The expression for dJ, the increase in the performance index due to roundoff 
noise, is given in (6.46), and shown below: 

dJ - trace ft z) (8.19) 


Using the adjoint Lyapunov equation as described in Appendix B, and as ap- 
plied in (5.47)-(5.50), we can express dJ as: 


„j . trace [w [° g]} 


( 8 . 20 ) 


where W is given in (5.48). Defining the lower right-hand (n+1)x(n+1) portion of 
W to be W , we can rewrite (8,20): 


A 2 

dJ = trace (A_S ~ 2 '.V_ 
12 d <} Q 


+ A„ 1 S ~ 2 * W * + A„ 1 

q - ' q - 1 q q q q-z q-z 7-1 q q q 7-1 




( 8 . 21 ) 


Once we have gotten to this point, the remainder of the development Is very simi- 
lar to the development of Chan [17]. As Chan has done, we can now define the 
matrices Wj. Using a recursive definition, 
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These matrices are called noise gain matrices by Chan, since their diagonal ele- 
ments reflect the gain from each roundoff noise source variance to the variance 
of the filter output, For our development, they will represent the gain from each 
roundoff noise source variance to the increase dJ in the performance index. Ap- 
plying (8.22), equation (8.21) can be further simplified: 


dJ 



q ( 

2 trace {A ; . S, W- 

/-I V 


or equivalently, 


(8.23) 


dJ 


£ I 
12 * 
/-I 




s 

JJ '■ '• 


jj 


K] 


n 


(8.24) 


Thus only the diagonal elements of Wj appear in (8.24). Since the diagonal ele- 
ments of Kj equal the diagonal elements of S^ -2 , and in fact all equal one for a 
scaled structure, we can eliminate this term, at least so far as the evaluation of 

A 2 

(8.24) is concerned. Since the scale factor —— will not affect the minimization 

process in any way, we can formulate the following objective criterion for the 
effects of roundoff noise: 


K'Lv} (8 ' 26) 

Now we must turn to the task of expressing the derivative of f as a func- 
tion of g{t). Chan [17] has shown that the digital filtering K ( and W ( matrices 
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have the following derivatives: 


dKf(t) 


— 6,(0 *,<«> + *,<0 6/(0 

dW.(t) 

» -G/(t) W f it ) Gy (t ) 


(8.26) 


These will apply equally well to the compensator case with its analogous and 
Wj matrices. Using (8.24M8.26) and following the method used in Chan we can 
write the derivative of the objective function f: 


df 2 
/-I 




(8.27) 


After substituting for the derivatives in (8.27) with the expressions in (8.26) and 
some manipulation as in Chan [1 7], we arrive at the following compact expression: 


2 trace (M.'it) G, (O) (8.28) 

a. /+1 v / 

where 

K (,) ]y*- 2 ([' < / < ' ) ]y* K (,) ]y; ( A /3yy-[ A /]** K ( ‘ ) ]y*} <8,2B) 

The quantity £ needed in the optimization procedure can easily be obtained from 
(8.28) and (8.29). 

Clearly, the K . and Wj matrices in (8.29) are defined differently from those 

derived for digital filters. Other than this, there are two external differences 
between our expression in (8,29) and the original expression derived by Chan. 
First, the lack of the factor [*,1 in the second term of (8.29) Is due to the 

»■ •* AA 
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fact that all the diagonal elements of K y are unity; recall that we assumed that 

our original structure was scaled. This is largely a procedural difference between 
Chan's derivation and our own — it terms of the optimization, it makes no 
difference. The second difference in this expression is the presence of the Ay 

term. Recall that the diagonal entry of Ay represents the exact number of 

roundoff noise sources associated with the j ^ row of ¥y. During the optimization 

procedure, any unconstrained unity or zero entries in these ¥ matrices will in gen- 
eral become non-zero and non-unity. Thus these new sources must also be includ- 
ed in the Ay matrices at the beginning of the optimization procedure. Inclusion of 

the Ay terms allows us to consider all possible structures. The assumption made 

by Chan, that the Ay matrices can be taken to be Identities or proportional to 

Identities, Can often be in error, especially for structures with multiple pre- 
cedence levels and few coefficients. The result will be only an approximate op- 
timized structure. Our inclusion of the Ay terms can be easily be incorporated 

into the digital filtering optimization results of Chan. This is one example where 
our results can be applied for digital filters. 

With the optimization procedure derived by Chan and the correct initial con- 
ditions, a structure identical to the minimum roundoff noise filter structure derived 
In closed form by Mullis and Roberts can be found. Similarly, using the adapted 
optimization procedure described above and the correct initial conditions, we can 
also duplicate the minimum roundoff noise compensator structure that we derived 
in section 6.6. To achieve this result for compensator structures, we must allow 
all the coefficients (except the next-to-last column of ¥^) of a one-level initial 
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scaled structure to vary. Thus all the diagonal entries of the matrices must 


be set to n+1. Similarly, by allowing only 2 by 2 diagonal blocks of coefficients, 
plus the last row and column (input and output coefficients) of to vary, we can 

optimize and produce a block optimal structure. In fact this procedure was used 
to generate the (one-level) block yj/ijliml F8 compensator structure studied in 
Chapters 5 and 6. 

Tfsi? optimization procedure was also applied to the (scaled) two-level paral- 
lel F8 compensator structure composed of direct form II sections designated as 
(c) in Chapter-?, 5 and 6. Its modified state space (before optimization) is shown 
below: 



0 0 0 0 0 

1 0 0 0 0 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 

0 0 0 0 1 

c c c c c 



(8.30) 


where each entry c represents a coefficient. Two extra coefficients were added 
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by allowing two other entries in ^ to vary, the (6,6) and (6,6) entries. Thus 

theta will be 17 coefficients total in the optimized structure. For this example, 
the matrices and A 2 will be: 

A 1 -diagonal [0 3 0 3 2 3] 

A 2 - diagonal [0 0 0 0 0 0 6] (8.31) 


Before optimization, the scaled coefficients values ranged from 10.48 to 0.073. 
Figure 8-1 shows the range of coefficients values and the resulting number of sig- 
nal bits necessary to hold the value of dJ due to roundoff to 6% (as in Chapter 6) 
after each iteration of the optimization process: 


Iteration Number 

Number Of Bits 

Coefficient Ranae 

| '0 

10.46 

10.48 - 0.073 

1 

8.745 

3.2 - 0.108 

2 

8.067 

1.46 - 0.108 

3 

8.2 

II 

4 

8.086 

II 

5 

8.06 

II 

6 

8.066 

II 

7 

8.057 

II 

8 

8.056 

II 

9 

8.065 



Figure 8-1 : Roundoff Noise Optimization Resuits 


Without including the 2 extra coefficients, which alters A 1 and increases 

the apparent required wordlength of the initial structure to 10.46, the number of 
bits needed (see figure 5-8) was 10.18. Thus the true improvement resulting 
from the optimization was 2.12 bits. This is quite impressive, since it was at- 
tained basically in only two iterations and is quite close to the block optimal value 
of 7.88 bits, which requires 26 coefficients. In fact, it is identical to the perfor- 
mance of the 17 coefficient parallel structure (b). We note that iterations 3, 4, 
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6, and 7 involved halving the integration step size due to increases in dJ over 
the current least value, and that the value after iteration 9 was actually lower 
than the value after iteration 2, but not appreciably. In the digital filter examples 
treated by Chan in [17,101}, typically only 5 to 8 iterations were required to 
achieve the full benefits of the optimization procedure. The block optimal compen- 
sator structure computed via this optimization procedure took 5 iterations to 
reach the approximate minimum wordlength. 

A byproduct of the optimization procedure for the figure 8-1 example was a 
reduction in the maximum coefficient value. Instead of needing 4 integer bits to 
represent the largest coefficient (see Chapter 6), the optimized structure required 
only one, again an impressive savings in wordlength. Intuitively, this savings may 
exist for any increase in the number of coefficients in a structure. This point 
needs more investigation. 

$8.4 The Minimization of Coefficient Wordlength in Compensators 

In this section we will develop an objective function for the minimization of 
coefficient rounding effects on the performance index J. Basically, we, will use 
the MSWL expression as presented in Chapter 6. The optimization could just as 
well be carried out for the SWL estimate, but the MSWL is simpler to compute and 
still tightly related to the more accurate SWL value. This objective function is 
quite different from the one developed in Chan [17], since it is based on J, and 
hence involves second-order sensitivities. Again, as with the SWL and MSWL 
derivations, this development will be useful in digital filtering for filters that are 
designed by optimizing some scalar differentiable criterion. 

Instead of minimizing the actual MSWL value, we will copy the approach of 
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the previous section; rather than minimizing the actual required wordlength, we 

■/ 

will minimize the expected value of dJ. Of course, for the analysis of finite 
wordlength coefficient effects, this expected value is over an ensemble of struc- 
tures — It Is not a time-average as in the roundoff noise case, Reviewing the 
results of Chapter 6, E(dJ) can be written: 


a2 N 
m-1 


a 2 j 

> 

0c 2 

m 

00 

1 


(8.32) 


where N is the number of non-zero, non-unity, and non-power-of-two coefficients in 
the structure. Thus we can drop the scale factor as we did with the roundoff 
noise objective function to form a new f: 



r 

' 

N 

d 2 J 


f - s 



m-1 

0e 2 

( m 

09 

/ 


(8.33) 


where 


e2j 

m 


trace 


i 9c m i 


(8.34) 


re. 


Z=AZ A' + 


0 (* 12 8 2 * 12 '} 


(8.35) 


and T contains the performance index weighting matrices as shown in (5.34). 
The tilde again refers to the parameters of the scaled system. We can also write 
the expressions (8.34) and (8.35) for the compensator before it is scaled, result- 
ing In: 
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- trace 

L 

da m 

tn 



(8.36) 


re. 


Z -A Z A* + 


|^ 1 2 ^ 2 ^! 


(8.37) 


where a m represents a coefficient of the unsealed structure. As with the ap- 
proach for roundoff noise minimization discussed in section 6.3, we Would like to 
express t as a function of the unsealed parameters and the scaling matrices S.. 

This is necessary for the computation of the derivative of f\ even though the ori- 
ginal structure selected for the optimization will be scaled, and Its Sy matrices will 

be identities, they do affect th$ derivative of f. 

q2 j q2 j 

The terms can be related to the terms as follows. Since 


® C #n 

tn 


3a m 

tn 


S-J,, a scaled coefficient v m at index (y,A) in the matrix ^ can be 
related to its unsealed counterpart by: 


hU[(v,r] 


kk 


(3.38) 


Since is thus a multiple of a m , we can write: 
tn m 


9 2 J _ d z j 
tn rn 


h-i] 

TO 


kk 


JJ 


(8.39) 


We can express this relationship compactly for all the coefficients in level / as: 


Section 8.4: The Minimization of Coefficient Wordlength in Compensators 


206. 


Where V^/jf) Is a matrix function whose dimensions match those of Its argument 


matrix M and whose ( J,k) tf> element Is only If the ( location In M 

Wy* 

corresponds to a multiplier coefficient in the transformed structure, and zero oth- 
erwise. Recall that all entries In the precedence level matrices of a structure 
whose Ideal values cannot be represented exactly with a finite number of bits are 
multiplier coefficients. Thus all 2 ero, unity, and power-of-two entries would not be 
considered to be multiplier coefficients. 

To compute the derivative of f with respect to t, we will need to determine 
the relationship of the second-order sensitivities of J with respect to the 

i 

coefficients of the transformed structure to the second-order sensitivities of J 
with respect to the untransformed coefficients. The general transformation ma- 
trices Pj in (8.1) and (8.2) are not diagonal, so the simple expression in (8.40) 


cannot be used. In fact, for the coefficient c m in the level of the transformed 


3 2 , 

structure, the term — — will now be related to ail the second partials of J with 


8c 


m 


respect to the entries in ^ that correspond to multiplier coefficients in the 


transformed structure, including the mixed second partials. To demonstrate this, 
the following matrix chain rule can.be applied [102]; if x and y are scalars, and 
M a matrix, then: 
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£■- — (£ 


(8.41) 


For this expression, the derivative of a scalar with respect to a matrix M is 
defined to be the matrix whose ( element is the derivative of the scalar 

with respect the ( j,k element of M. Now let the precedence level matrices as- 
sociated with the transformed structure be designated with the tilde symbol. By 
applying the matrix chain rule with J as y, the coefficient at index ( J,k ) in ^ as 

x, and ¥ f as M, then we get: 



trace 


f a J 



a */ ' 


(8.42) 


Recall from (8.1) that the relationship between the transformed and un- 
transformed precedence level matrices can be written ^ (P/_l) - ** Thus 

the second term in the trace of (8.42) can be written: 


a* • 

= Vl ' * ki 



(8.43) 


Note that equation (8.42) seems to imply that the derivative ■ — is a func- 

8 [*/U 


tion of the matrix 


JM 


which involves derivatives with respect to a// the entries 


of tyj, not just the few coefficient entries. This would imply a tremendous compu- 
tational load, especially when second derivatives were considered — for a 
seventh-order compensator with two precedence levels and 7 Intermediate nodes, 
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we would have to compute mixed second partfals with respect to all the entries in 

••oh level, or 49^+49^ second derivatives. This number would be independent of 
the number of actual multipliers in the structure. Even though this computation 
need only be performed once at the start of the optimization process, it would in- 
volve far too much computation time. Fortunately, it is not necessary to compute 
all the derivatives above. In fact, since the matrices Pj are constrained not to 

vary certain fixed entries in the 9 y matrices, the matrix in equation (8,43) will 

have a ir^ecial property; it will have zero entries in exactly those locations 
which will eliminate the dependence of (8.42) on derivatives with respect to 9j 


entries which are not in the same locations as the multiplier coefficients of the 
transformed structure. In other words, the derivatives of J with respect to the 
multiplier coefficients in the transformed structure will be functions only of the 
derivatives of J with respect to the 9. entries which are in the same exact loca- 


tions as those multiplier coefficients. 


To reflect this fact, the term in (8.42) 

oMr , 


should be replaced by Y{9j), where Y{M) is a matrix function whose dimensions 

match those of its aigument matrix M and whose {J,k)^ element is — only 

aiMljk 


If the (j,k) th location in M corresponds to a multiplier coefficient (non-zero, non- 
unity, etcetera) in the transformed structure, and zero otherwise. Note that this 
definition of Y(M) is analogous the the definition of y 2 (/ If) in (8.42). Thus we 

can rewrite (8.42): 
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a M traco ( v <*/> W ('V)" 1 ) <8t44) 

To rotate the second derivatives of J with respect to the coefficients of 
the transformed and untransformed structures, let us take the derivative of 


(8.44): 


3 Z J 


- trace 


Jk 


'W'W ( p / , )' , FTFi- ( V <M 

1 J 


(8.45) 


Inside the trace expression above, the matrix chain rule (8.41) con be applied to 
each non-zero element of the derivative of y (tp. For example, If the (r,s) entry 

of Vtf Is also a multiplier coefficient, then: 


»[*!]„ * 


trace 


Vi i p ;) 


.\-i 

a* 


/ 


e j 




(8.46) 


We will define this trace to be j/fy] » Interchanging the order of differentiation, 

and applying the same reasoning that eliminated the extra derivative terms In 
* 

(8.42), we can express (8,46) os follows: 

f \ 




p /-i ,£ *j ( p i') 


-1 8 


'[*/]» 




(8.47) 


Note the presence of the mixed second partial derivatives of J in (8.47), (.at MS 


Section 8.4: The Minimization of Coefficient Wordlength in Compensators 


200 , 


define the matrix Bj to have non-zero entries as described In (8.4 7) 

whsre (r,s) is the location of a multiplier coefficient in the transformed structure, 
and zero otherwise. With this definition, (8.45) can be rewritten: 


d 2 J 


[*/]£ 


trace 





(8.48) 


Thus with (8.47) and (8.48), we have now fully described the relationship 
between the second partial derivatives of J with respect to the coefficients of 
the transformed structure and the mixed second partials of J with respect to the 
corresponding coefficients in the untransformed structure. 

We can include scaling in this formulation by applying the results derived in 
(8.38)-(8.40) to the transformed structure. Thus, the complete expression for 
the objective function f (t) will be: 


where 


1 " 2 S /~ 2 V 2 ( */ ) S /-1 
/“I 


[VgC^y)] - trace E kJ (p,')' 1 S ; J 


(8.49) 


(8.50) 


and . now represents the coefficient parameters after transformation but before 
scaling, and Bj is a function of all the mixed second partials of J with respect to 
the entries in that correspond to coefficients In the transformed scaled struc- 
ture, Given that the transformed scaled structure will have N coefficients, the 
advantage to the above formulation of f is that the N 2 mixed second-order 
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coefficient sensitivites need only be computed once, at the start of the optimiza- 
tion procedure. The N ^ Lyapunov equations that will have to be solved for these 
sensitivities cannot be simplified through any application of the adjoint Lyapunov 
operator described in Appendix B. However, the entire set of equations will have 
the form (6.21) as described in section 6.4, and can be simplified in the same 
manner as the computations involved in (6.21). Specifically, the first step of the 

Lyapunov solution method can be bypassed for ail N ^ equations. As before, this 
saves at least 76% of the total computation time involved in such solutions. 

Now that we have formulated an expression for f(t), we can examine its 
derivative with respect to the transformation parameter t. Following the procedure 
of (8.8), we must first evaluate the derivatives of f with respect to the matrices 

P., and then multiply the resulting term by the matrices Gy P. for all /'. From 

(8.49), will involve the derivative of S.“^, which is a matrix composed of the 
cfr • ' 

l 

diagonal elements of Ay, and the derivative of its inverse. These can be found 

by applying (8.26) and a simple matrix identity for the derivative of a matrix In- 

Of tf^pO^y) 

verse [102]. The derivative - will also Involve the derivative — — . This 

L J 0P f OP y 

term can be computed easily since the expression for and in " 

volving By in (8.47) is a direct function of the 9j matrices. All the other terms in 

(8.47) and (8.50) are not dependent on the 9. matrices. The actual formation of 

s 

((f) in (8.9) from the resulting derivative expressions will be quite tedious, but 
really Is only a matter of bookkeeping. As a whole, the method we have 
described above is computationally quite efficient. We have not tested the optlmh 
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zation procedure of section 8.2 in the context of the statistical wordlength-based 
objective function (8.4Q) for an actual example. 

§8.6 Criteria For Selecting Unconstrained Coefficients 

As stated by Chan [101], one of the major open issues concerning this op- 
timization procedure relates to selecting which entries in the ¥y matrices will be 

constrained. For the optimization of parallel or cascade compensator structures 
composed of second-order sections, we have formulated some general guidelines 
that seem appropriate. As will be shown, these guidelines can be applied equally 
well to the digital filtering case. 

For the optimization of roundoff noise effects, the block optimal form of 
Mullis and Roberts, and Hwang still tends to have too many coefficients, as com- 
pared with structures of nearly the same performance. However, It is possible to 
use block optimal sections combined with direct form II sections, thereby saving 
several coefficients. In order to select i'»e section that should be converted to a 
block optimal section, we must examine the objective function f in (8.25). Recall 
that f depends only on the diagonal elements of the matrices Ay and Wj. The 

matrices Ay reflect the number of roundoff sources that are associated with the 
rows of the matrices - ¥y, and the diagonal elements of the matrices M/y contain 
the gains from the variances of the Intermediate nodes Ty to the performance in- 
dex J. For a parallel direct form II structure (see (3.25) and figure 3-8), which 
has two levels, the diagonal elements of W ^ will be pairwise associated with the 

specific second-order sections. Since we know the weights [Ay j the relative 
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diagonal values of Wj will Indicate which sections in the structure contribute the 

\ 

most to the objective function f. The matrix W 2 for a parallel direct form II 

4 

structure will not be important to this consideration, since ¥g contains multiplier 

coefficients and hence roundoff sources that only affect the output node. Recall 
from (8.1) that this node cannot be altered by the optimization procedure. 

Let us consider the example treated in section 8.3. For this structure, the 
parallel structure (c), the diagonal values of were as follows: 

[iV i ] jj for iSy'SB - |l .71 , 7.32, .092, .264, 342, 465 J (8.61) 

Since the diagonal values of W ^ are pairwise associated with the three second- 

order sections of this example, we can easily identify the third section as the 
trouble spot — the third pair of values (342,466) is clearly the largest, given 
the weights In (8.31). This fact justifies the specific location of the two ex- 
tra coefficients chosen to be varied. In fact, if we had allowed the section to be 
truly a block optimal section, it would have required three extra coefficients and 
not two. However, in this example there are indications that the performance 
with two could be quite excellent — hence one should not automatically go to a 
block optimal section. Certainly, this point requires further investigation. 

When optimizing only a portion of a structure as discussed here, it is 
necessary to know the performance level that would result for the block optimal 
case, so that one can judge the effectiveness of using fewer coefficients. This 
value can be found using this same optimization procedure, but with more uncon- 
strained ¥ entries (more multiplier coefficients). Note that this approach to deter- 
mining which section of a structure to optimize can also be adapted to include 
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cascade structures. We should also mention that the above guidelines will of 
course not be too effective If the diagonal elements of IV ^ tend to be similar In 

magnitude. 

A similar guideline may be used when minimizing coefficient wordlength. As 
mentioned in Chapter 6, by computing the MSWL or SIML, we have already comput- 
ed the second partials of J with respect to the coefficients in the structure. 
Furthermore, the SWf. computation will also produce the mixed second partials of 
J. It is precisely these sensitivities that we need to produce f in (8.49). We 
would simply have to compute the SWL of the original structure {¥^(0)}, and save 

g2 j 

the sensitivities. If any of the second-order sensitivities of the original 

da£ 

m 

structure are particularly large as compared to any others, then the second-order 
section In which those coefficients reside would be a likely candidate for optimiza- 
tion. In particular, any zero or unity entries in the portion of the . matrix 

corresponding to that section should be unconstrained in the optimization pro- 
cedure. Such a section, when optimized, will have the same form of modified 
state space representation as a block optimal section, but it will be optimal with 
respect to a different criterion. 

Although the criteria presented above by no means fully answer the ques- 
tion of which ^ entries to constrain, they do provide an important guide in situa- 
tions where performance and minimal numbers of coefficient multiplies are impor- 
tant. 

In one sense, the constraint issue is part of a larger topic; the selection 
of an Initial structure from which to optimize. One property of the iterative con- 
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strained optimization procedure described in this chapter is that the number of 
precedence levels is fixed during the optimization. Therefore, optimizing a two- 
level structure for some objective function does not tell us whether an extra lev- 
el will significantly improve performance, or if one less level can be used without 
degrading performance. In general more ievels provide more degrees of freedom 
for the optimization, but of course this will depend on the number of constrained 
coefficients and their locations in the tyj matrices. For now, these questions must 

be dealt with by trying different initial structures, with different numbers of levels. 
Further work is needed in this area, both for the synthesis of digital filter struc- 
tures, and for the synthesis of digital compensator structures. 
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$9.1 Summary and Conclusions 

In this section we will outline the basic points developed in this thesis. 
We will especially stress the difference between the issues as they relate to di- 
gital compensators as opposed to digital filters. 

Many elegant mathematical solutions exist for control problems. Often, the 
resulting compensators are directly implemented on large-scale computer systems, 
where speed and accuracy are assured, and cost not critical. The issues in- 
volved In the implementation of such compensators on small-scale digital systems 
have not received the attention they deserve. For these applications, the finite 
memory, relatively slow speeds, and the expense of the hardware must be con- 
sidered in the overall design process. Fortunately, these very issues have been 
examined in the context of digital signal processing, and a great many useful 
results exist. Our approach was to use, adapt, and extend these results to digital 
feedback compensators. This development is essentially the contribution of the 
thesis. In several situations, however, we have extended these results to the 
point where they also constitute a useful extension for digital filtering applica- 
tions. These extensions will also be pointed out in this summary. 

The steady-state LG6 control and estimation problem was selected as a 
basic framework for several reasons. First, this type of controller has been 
shown to have desirable performance properties in terms of its robustness, mul- 
tivariate formulation, optimal nature, and so forth. Second, the LOG problem has 
received a great deal of attention in the recent literature, and is being increas- 
ingly applied to real systems. Third, the LQG problem has an explicit scalar objec- 
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tive function, which can be adopted as a performance metric against which the 
degradation due to finite wordlength effects can be measured. It is not neces- 
sary' to choose such a performance measure or even the LQG problem at all. How- 
ever, this choice allows us to develop results in a concrete setting. Finally, using 
the LQG control framework, we can bring out all the issues we wish to raise, and 
this can in fact be done using single-input single-output systems. As we will dis- 
cuss, extensions to the multiple-input multiple-output case are straightforward, 
although the issue of multiple-input multiple-output structures remains largely unex- 
plored. 

In Chapter 2 we presented the assumptions, problem statement, and solu- 
tion method involved in an LQG system, and raised a key point. The calculations 
involved in producing the compensator output and state values require a finite 
amount of time t c . This time must be accounted for in the LQG design procedure. 

Two implications arise: 1) the sampling period must be greater than t_, and 2) 

the compensator output at a given sample time can only depend on past compen- 
sator state and input values. However, if T » t., we must not constrain the sys- 

tern to wait a full T seconds for its control update. It should only have to wait t 

C 

seconds. Hence, we presented the LQG solution method and sample-skew idea 
given in Kwakernaak and Sivan [1]. 

Once such an ideal compensator is designed, it must be implemented in 
finite-precision hardware. In Chapter 3 we presented the concept of a structure 
as defined for digital filters, and the notation introduced by Chan for representing 
such structures. The concept of an accurate notation for reflecting the arithmetic 
and quantization operations in a structure and the inherent precedence of these 
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operations is critical; although all structures have the same transfer function and 
same performance as the ideal compensator under infinite precision, they will in 
fftOt All differ, In general, under finite-precision arithmetic. For control applications, 
two points were stressed. First, a state space is insufficient to represent ail pos- 
sible structures. In fact, It can represent only that class of structures possess- 
ing one precedence level. Second, and more important, the notation developed by 
Chan for filter structures is not quite suitable for representing compensator struc- 
tures — in fact, the concept of a structure is slightly different in control applica- 
tions. In digital filtering, the calculation time necessary to compute the next filter 
output from the current filter states is ignored, since it only represents series de- 
lay time. Whether the filtered data emerges 0.1 seconds after Its Input, or 0.15 
seconds, Is really of no concern, as long as the data rate is high enough. Howev- 
er, this delay must be included in any compensator structure, since this structure 
Is embedded in a feedback loop. If one considers this delay as part of the plant 
(that Is, as a series delay following the compensator), then this effectively raises 
the dimension of the plant and of any compensator designed via the LQG ap- 
proach. On the other hand, including the delay as part of the compensator accu- 
rately describes the operation of the compensator since every unit delay 
corresponds to a storage register, and allows us to consider more general struc- 
tures In which the added delay does not appear as a series delay following the 
compensator. Thus we adapted Chan’s notation for compensator structures, and 
called it the modified state space representation. It has all the advantages of 
Chan’s notation for digital filters, and furthermore includes all the calculation de- 
lays that exist In the compensator. A major implication of this definition of com- 
pensator structures is that a delay-canonic structure (one that has a minimal 
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number of delays) for an n tft - order plant and n^-order compensator has n+1 unit 
delay elements, instead of n as In digital filtering. Thus a cascade of direct form 
I second-order sections, not canonic for digital filters, fs canonic for digital com- 
pensators. In the context of this definition of structures, we presented several 
classes of structures and pointed out that a straightforward implementation of the 
Ideal compensate;* equations (called a 'simple’ structure) is not usually a good 
choice for steady-state LQG compensators, since it has many more coefficients 
than nearly every other structure used in digital filtering. Of course, for situations 
where it was not convenient to compute the parameters of any structures other 
than the simple structure, such as in adaptive control systems or in any system 
where the appropriate Ricatti equations must be computed online, the simple 
structure or a one- or two-level version of this structure (still with many 
coefficients) must be used. 

In Chapter 4 we presented several digital computer architecture concepts 
as they relate to digital filters and to digital compensators. The basic idea of 
serialism and parallelism, the degree to which processes run sequentially or con- 
currently, extends without modification to digital compensators. The intuition that 
can be gained concerning precedence and maximally-parallel architectures from 
the Chan i^oUtion for digital filters is identical to that gained from the modified 
state space representation for digital compensators. However, the same cannot 
be said concerning the application of pipelining to compensators. In fact, the ap- 
plication of pipelining to compensators brings out another point — the interaction 
between the ideal design process discussed in Chapter 2 and the implementation 
of the resulting compensator. Basically, the use of pipelining alters a structure so 
that the number of precedence levels in the structure Is reduced, while still pro- 
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ducing nearly the same transfer function. The only difference is the addition of 
one or more series delay units. Fewer precedence levels means a smaller 
minimum calculation time and a faster possible sampling rate . For digital filters, ' 
the extra series delay encountered is of no importance, as discussed above. 
What Is significant Is the potential increase In the data rate. However, for com- 
pensators, this delay must be considered in the design process. If ignored, this 
delay results in extra negative phase shift and the performance of the control 

T ■ 

system may deteriorate — it may even become unstable, as demonstrated in 
Chapter 4. To include the effects of the delay, we can simply increase the order 
of the plant (with one additional state per unit delay added) and redesign the op^ 
timal LOG compensator for the higher sampling rate. The resulting higher-order 
compensator structure must be able to be pipelined in the same manner as was 
the original structure. Depending on the application, the pipelined control system 
with its Increased sample rate can have superior performance as compared to the 
original, slower, non-pipelined system. 

In the next three chapters, the effects of finite wordlength in digital com- 
pensators were investigated. These effects were divided into three areas; the 
uncorrelated effects resulting from quantization of the multiplier products (quanti- 
zation noise, Chapter 5), the correlated effects of these same n-aantization opera- 
tions and the overflow nonlinearities in the compensator (limit cycles, Chapter 7), 
and the effects of quantizing the infinite-precision coefficients of a structure 
(coefficient quantization, Chapter 6). 

The analysis of quantization noise includes an important sub-issue — scal- 
ing. Scaling Is necessary to match the dynamic range of the signals in the struc- 
ture to the dynamic range representable with the fixeci-point words. Various 
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types of scaling were described for digital filters, depending on the known charac- 
teristics of the compensator input signal; some are more conservative than oth- 
ers (because they assume less is known about the input), thereby resulting in 
higher noise levels. For digital feedback compensators, two issues were brought 
out. First, the common LQG set-point configuration makes use of a compensator 
with two inputs, either or both of which can have DC components. This fact would 
require that the most conservative type of scaling be used (/^ scaling), and 

would in fact require the use of techniques for scaling multiple-input structures. 
However, we show that the use of an alternate but equivalent set-point 
configuration can avoid this problem. With the alternate configuration, the compen- 
sator has only one input, and this input has no DC component. Thus a less con- 
servative scaling procedure (/ 2 scaling) can be employed. The stochastic scaling 

method applied equalizes the probability of overflow at every node in the struc- 
ture. However, this probability depends on the behavior of the entire closed-loop 
system, not the compensator alone (which could be unstable). Thus we have 
adapted this digital signal processing scaling procedure for use with digital com- 
pensators. 

Once a structure is scaled, we can compute the effect of quantization 
noise on some objective criterion. For digital filters, we presented the modelling 
associated with roundoff and sign-magnitude truncation quantization, and restrict- 
ed the analysis to the more tractable (and lower noise) case of roundoff. To com- 
pute the noise power due to roundoff errors at the output of a digital filter, a 
Lyapunov equation of order n can be solved, where n Is the number of unit delays 
In the filter. For digital compensators, again, the effect of roundoff errors on the 
performance index is a closed-loop phenomenon. Thus we have adapted the 
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analysis method to include the entire plant and compensator system, as we did 
for compensator scaling. In addition, for digital signal processing applications, 
Mullls and Roberts have derived a one-level minimum roundoff noise filter structure. 
It proved possible to adapt this method to produce a minimum roundoff noise com- 
pensator structure, As before, the entire closed-loop system had to be con- 
sidered. 

To test the roundoff effects of different structures for implementing a 
higher-order compensator, the F8 example was introduced. The results from a 
roundoff analysis of these structures brought out several points. First, as in digi- 
tal filtering, the direct form II structure had poorer performance In terms of the in- 
crease in J due to roundoff noise than factored forms like the cascade or parallel 
structures, and as in digital filtering, the pairing and ordering issues associated 
with cascade structures were significant In determining their performance. As ex- 
pected, the block optimal minimum roundoff noise compensator structure was 
better than any of the other structures tested. However, two points were raised 
that were different for digital compensators as compared to digital filters. First, 
the pairing issue is further complicated in control compensators due to the pres- 
ence of many real poles. Most digital filters have at most one real pole. Howev- 
er, controllers can frequently have more than one real pole. Thus these poles 
must be paired if second-order sections are to be used. The same applies to real 
zeros. Thus even a parallel compensator structure brings out the pairing issue, 
where parallel filter structures have no such consideration. Secondly, the default 
’simple’ structure for digital compensators, not used for filter structures, did per- 
form comparatively well. However, there were two structures with many fewer 
coefficients that did even better. 
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The effect of coefficient rounding on performance is basically a determinis- 
tic one. Given a set of coefficients, we can compute exactly the resulting perfor- 
mance degradation. However, in digital filtering, a statistical approach based on 
first-order sensitivities has been developed for estimating the coefficient 
wordlength required to meet some degradation level. Thus it Is not necessary to 
directly evaluate the performance repeatedly until a suitable wordlength is found. 
We have extended the statistical approach to the LQG compensator, and In so do- 
ing, have raised an important point. Because the LQG compensator minimizes the 
performance Index J, all first-order sensitivities with respect to the compensator 
coefficients are zero. Thus second-order sensitivities are necessary to estimate 
the increase in J due to coefficient rounding, and In fact J can only Increase with 
such rounding. The necessity for second-order terms will be true of any parame- 
ter optimization problem, for example, sub-optimal control problems like reduced- 
order compensators. In fact, if a digital filter is designed to minimize some 
differentiable scalar objective function, then a statistical wordlength estimate for 
this filter using this same objective function must also use second-order sensitivi- 
ties. This constitutes an extension to the results for the implementation of digital 
filters. ' 

Other issues concerning coefficient wordlength are raised when we apply 
the statistical methods developed to the F8 system. First, we have evaluated 
the structures according to the wordlength required to achieve a specific degra- 
dation level. As in digital filters, there was a strong correlation between the low 
noise and low coefficient sensitivity structures. Again, for digital compensators, 
we can state that the "simple'’ structure performed well, but was still out- 
performed by the same two structures as In the roundoff analysis. The SWL sta- 
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tistical estimate developed using second-order sensitivities, a new concept, 
proved to be conservative as is its filtering counterpart based on first-order sensi- 
tivities. However, for the five structures requiring the least bits, it was very ac- 
curate (0 to 1 .4 bits conservative). The SWL value was much more conservative 
for the poorer structures: the direct form II, and the cascade and parallel struc- 
tures using identical inadvisable pole pairings. Unlike the usual digital filter sta- 
tistical estimate, a second slmpler-to-compute estimate was possible, based only 
on the mean degradation in performance. (This value would be zero for any esti- 
mate based on first-order sensitivities.) This MSWL estimate was very tightly re- 
lated to the SIV£ value, from .68 to .04 bits lower in all 10 cases, and can thus 
easily be used for a relative wordlength comparison between several candidate 
structures or in an optimization algorithm. The major advantage of these two sta- 
tistical estimates over a deterministic determination of wordlength was not in the 
computation time saved, which was minimal (15% -- 30%) for under 20 coefficients 
and nonexistent for over 20, but in one very important area. Since the estimates 
were continuous in nature and differentiable, they could be used as the scalar ob- 
jective function for a structural optimization procedure. In such a procedure 
based on the statistical estimate, we had to compute all the (mixed) second par- 
tial derivatives of J with respect to the N coefficients — but this needed to be 
done only once for the entire iterative procedure. This point was further 
developed In Chapter 8. 

In the discussion on limit cycles in Chapter 7, we reviewed the methods 
used in digital filtering for dealing with limit cycles. Although our results in this 
area were limited, four observations relating to digital compensators were brought 
out. First, a control system with an open-loop unstable plant, or a plant with an 
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integrator pole, must of necessity have some sort of low-amplitude limit cycle. 
The system output will increase from zero until it reaches the lowest quantization 
level of the output A/D, Only then can control action seek to restore the system 
to the zero level — but then the process will repeat. This situation is unavoid- 
able since the system is essentially open-loop when the magnitude of the output 
level is less than one A/D quantization level. Second, the global feedback loop 
around the compensator will change the nature of the limit cycles in the compen- 
sator, and can even cause limit cycles. For example, a finite impulse response 
filter will not exhibit limit cycles, yet a feedback system using a finite impulse 
response compensator may exhibit limit cycles. Third, the techniques used in 
filtering for dealing with limit cycles do not often extend to compensators, espe- 
cially when the plant has an Integrator or right-half plane pole. Finally, based on 
the random rounding and experimental results in the digital signal processing 
literature, it is not clear whether any limit cycles will exist in LOG systems. The 
noise driving the system and the noise in the output will tend to quench any limit 
cycle that may occur. This of course will depend on the intensity of the noise. 
However, even though limit cycles themselves may be suppressed, other nonlinear 
effects such as jump discontinuities may occur. Furthermore, the quantization 
noise in the system is not white, and the very presence of correlated noise in the 
system may cause difficulties. There are few techniques for handling these 
effects, even for digital filters. 

The final topic we treated in the thesis is the iterative constrained optimi- 
zation of structures. The basis for this technique lies in the work of Chan for 
filters. However, we can again adiapt the algorithm to handle digital compensators. 
For minimizing roundoff effects, the adaptation was quite similar to that required to 
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compute the closed-form block optimal one-level minimum roundoff noise structure 
of Chapter 5. However, for minimizing coefficient roundoff effects, our extension Is 
quite different from the Chan approach, since our statistical estimate is based on 
second-order sensitivities. We demonstrated the optimization technique for roun- 
doff noise effects for several structures, but did not test the changes required to 
produce a minimum coefficient wordlength structure. Our effort in optimization did 
bring out two points which extend the optimization technique of Chan for digital 
filters also. First, our technique for the constrained minimization of roundoff noise 
was more general than that of Chan. We accounted for the exact number and lo- 
cation of roundoff error sources In the structure; Chan uses an approximation to 
simplify his analysis. This change can easily be incorporated into Chan’s filter 
structure optimization algorithm. Secondly, we pointed out some general ap- 
proaches to selecting which portion of a compensator structure should be optim- 
ized, that is, the portion that will produce the greatest improvement when optim- 
ized. These guidelines also apply to the optimization of filter structures. 

$9.2 Future Efforts 

Based on our results, there are several extensions that should be men- 
tioned, and also several new issues that we did not address. Let us first consid- 
er some of the extensions, both to other performance criteria and to other control 
or estimation problems. 

In principle, our results extend to the consideration of other performance 
measures, such as gain margin, phase margin, and so forth. However, the details 
of the derivations and the actual equations will be quite different. For example, 
the statistical wordlength estimate may be dominated by first-order sensitivities. 
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However, for the steady-state Kalman filter problem (considered at length by Sri- 
pad [13]), our results would be more directly applicable. As in the LQG case, this 
problem has a simple minimized scalar objective function, the trace of the error 
covariance matrix. However, since this is not a control problem, but an estimation 
problem, It will have many of the characteristics of a digital filter. Thus, while a 
statistical wordlength procedure for the Kalman filter will require the use of 
second-order sensitivities (like the LQG case), the scaling and roundoff analysis 
procedures will not depend on any closed-loop system behavior (unlike the LQG 
case). Still, the adaptation of our results and techniques to digital Kalman filter 
Implementations will be fairly straightforward. Of course, the Kalman filter would 
have to be considered to be a multiple-output compensator (see the discussion 
below on multiple-input multiple-output systems). 

Our efforts can also be easily extended to certain sub-optimal parameter 
optimization control problems. Both the optimal nature and the closed-loop aspects 
of the LQG problem are found in these controllers. In fact, if the same J is taken 
to be the performance measure, all our results apply. The equations will differ 
only in the fact that, in general, the compensator dimension will be smaller than 
the plant dimension. 

As mentioned above, there are several issues which we did not consider in 
our work. The first of these involves the nature of the LQG problem. By express- 
ing all the desired performance characteristics of a control system in a single all- 
encompassing scalar function J, there can be some question as to the relevance 
between the minimization of J and the satisfying of the initial performance objec- 
tives. The work of Harvey and Stein [24] mentioned In Chapter 2 is an Important 
step towards solving this problem. What we can state is the following: to the 


Section 9.2: Future Efforts 


227. 


extent that the index J is relevant to the desired control system performance, 
our analyses based on increases In J will be relevant to the relative performance 
of art Implementation. 

Another important issue is the application of our results to multiple-input 
multiple-output compensators, since there are a great many real-world systems 
that are multiple-input multiple-output in nature. Given some multiple-input 
multiple-output structure, our results apply with only a few minor changes. Howev- 
er, the whole question of how one designs multiple-input multiple-output structures 
is basically unexplored. The modified state space notation is sufficiently flexible 
to cover the multiple-input multiple-output case, if we simply have input and output 
vectors, instead of scalars. Multiple-output scaling is no problem, since the 
present technique already scales all the nodes. However, some modifications will 
be required to implement scaling procedures for multiple-input LQG compensators. 
Certainly we can still compute the variances of all the nodes of the compensator, 
accounting for the closed-loop nature of the control system, and its driving and 
measurement noises. Recall that the aim of the stochastic scaling procedure was 
to equalize the probability of overflow at all the compensator nodes and the com- 
pensator input (plant output). However, for multiple-output plants (multiple-input 
compensators), there is a problem. Figure 9-1 shows a simple double-input com- 
pensator. The variances of the two system outputs y^ and y 2 will not in general 

be the same. Thus we cannot equalize the probabilities of overflow at every 
node and every compensator input. One possible solution is to select only one of 
the compensator inputs to have the same probability of overflow (after scaling) as 
all the nodes, and to allow the remaining compensator inputs to have a lower pro- 
bability of overflow. This can be accomplished by choosing the compensator Input 
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Figure 9-1 : Double-Input Compensator Control System 


Vf with the largest variance for use in the scaling procedure of Chapter 5. In- 
stead of normalizing K in (5.21) and K, in (5.23) by dividing by the variance of 

V ' 

y, we will use the variance of y^. However, in equation (5.22), the symbol y must 
refer to the vector y, not y f .. Other than these changes, the rest of the compen- 
sator scaling procedure basically remains the same. (In the full multiple-input 
multiple-output scaling procedure, recall that u must also be a vector.) One other 
point involving scaling should be mentioned. Since each A/D unit has its own 
scale factor, we must also consider this scaling issue in the multiple-input sense. 
However, to preserve the overall ideal system performance, all these scale fac- 
tors must be the same. Again, their choice will depend on the plant output whose 
combined variance/system transients are the largest. 


The question of multi-loop limit cycles does not really further complicate the 
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limit cycle question. If any effective limit cycle analysis method is found for deal- 
ing with single-loop control systems, it should directly extend to the multi-loop 
case. 

Limit cycles themselves may not be an issue for LQG control systems. 
However, there is a middle ground between white additive quantization noise and 
a limit cycle oscillation. Jump phenomena and the presence of correlated noise 
can be very detrimental to control applications. The work of Sripad [13,56] and 
Parker and Girard [103] on the correlated nature of quantization errors should 
serve as a foundation for studying such effects. 

Another important issue is involved in the constrained optimization of struc- 
tures presented in Chapter 8. At one level, more work needs to be done In test- 
ing and evaluating the minimization of coefficient wordlength. However, on a more 
fundamental level we have the question of how to select the initial structure. 
(Recall that the iterative optimization procedure must begin with a specific struc- 
ture and then apply transformations to it.) The choice of initial structure is Impor- 
tant because the iteration procedure cannot change the number of precedence 
levels in the initial structure. The question of how many precedence levels to 
use is a very complex one. It is dependent on the number of (unconstrained) 
coefficients desired, the speed requirements of the application, and the accept- 
able level of performance degradation. Furthermore, given an initial structure, we 
do not always know the best way to choose which coefficients to constrain. 
Such considerations are of importance to the optimization of both digital compen- 
sator structures and digital filter structures. 

Finally, we wish to mention a longer term effort that may become of impor- 
tance to control engineers. This thesis effort has assumed right from the start 
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that a fixed-point numerical representation is being used. This implies minimal ex 
pense and minimal computation time as compared to floating-point arithmetic com- 
putation. However, as the hardware evolves, new systems of arithmetic arise 
that may be competitive with fixed-point. Particularly, a system called FOCUS 
[104] has been reported In the literature. The main motivation for FOCUS has 
been the problems encountered in control and certain other signal processing ap- 
plications. Specifically, control systems require the most accurate control signals 
when the system output is close to the desired level (to reduce steady-state er- 
ror) and less accurate control levels when far from the desired set point. The 
FOCUS system of numerical representation and arithmetic combines the accuracy 
advantages of floating point with the hardware simplicity and higher speed of 
fixed point. Applications of our work on compensator implementation to the FOCUS 
number system may become quite useful for control systems. 

The purpose of this thesis was to expose the fundamental issues involved 
In the digital implementation of control compensators, and to use, adapt, and ex- 
tend the- techniques of digital signal processing in order to develop methods appli- 
cable to control. We believe that our efforts have provided the foundation for an 
overall methodology for the implementation of compensators. 
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Appendix A: F8 Data 


This appendix will present the continuous-time F8 model discussed In 
Chapters 6 and 6, and Its discrete-time equivalent. The G and K matrices comput- 
ed by the procedures mentioned In Chapter 2 are also given. Finally, data 
defining all 10 candidate structures analyzed in Chapters 5 and 6 and also the 
optimized structure discussed In Chapter 8 will be presented. 

The parameters of the sixth-order single-input single-output continuous-time 
F8 system are given below, following the notation of Chapter 2: 


The A matrix for the continuous-time sixth-order F8 system: 


-6.696d~01 

6.7000d-04 

-9.010d+00 

O.OOOd+OO 

-1.577d+01 

0.00d+00 

0.000d+00 

-1.3467d-02 

-1.41 1 d+01 

-3.220d+01 

-4.330d-01 

0.00d+00 

I.OOOd+QO 

-1.2000d-04 

-1.214d+00 

0.000d+00 

-1.394d-01 

0.00d+00 

1.000d+00 

0.0000d+00 

0.000d+00 

0.000d+00 

0.000d+0^ 

0.00d+00 

0.000d+00 

0.0000d+00 

0.000d+00 

0.000d+00 

-1.200d+01 

i.sod+oi 

0.000d+00 

0.0000d+00 

0.000d+00 

0.000d+00 

0.000d+00 

0.00d+00 

The B matrix: 






0.000d+00 
0.000d+00 
0.000d+00 
0.000d+00 
0.000d+00 
1 .000d+00 






The C matrix: 






1 .000d+00 

3.091 d-03 

3.128d+01 

1 .000d+00 

3.592d+00 

0.000d+00 
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The Q matrix for the state norm: 


0.637d+OO 

0.000d+00 

0.000d+00 

O.OOOd+OO 

0.000d+00 

O.OOOd+OO 


0.0000d+00 

2.6564d-07 

2.6860d-03 

O.OOOOd+OO 

3.0850d-04 

0.0000d+00 


0.0000d+00 
2.6880d-03 
2.71 74d+01 
0.0000d+00 
3.1210d+00 
0.0000d+00 


O.OOOOd+OO 
O.OOOOd+OO 
O.OOOOd+OO 
2.71 74d+01 
O.OOOOd+OO 
O.OOOOd+OO 


O.OOOd+OO 

3.086d-04 

3.121d+00 

O.OOOd+OO 

3.686d-01 

O.OOOd+OO 


The fi matrix for the control norm: 
6.2S20d+00 


The driving noise covariance 3^: 


O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 


O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 


O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 


O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 

O.OOOOd+OO 


O.OOOd+OO 
O.OOOd+OO 
O.OOOd+OO 
O.OOOd+OO 
1 .000d-06 
O.OOOd+OO 


The measurement noise covariance matrix E 2 : 

1 .8441 d-03 

The discrete-time parameters for the above system sampled 
were computed according to the equations in Chapter 2: 


O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 

O.OOOd+OO 


O.OOOd+OO 
O.OOOd+OO 
O.OOOd+OO 
O.OOOd+OO 
O.OOOd+OO 
1 .000d-06 


at 10 Hertz 
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Discrete-time Transition matrix ♦: (Every two rows shown below is actually only 
one row of the matrix 


8.941 68899876840d-01 
-9.46729891 346440d-06 
-2.2200696458 1 9 1 7d-0 1 
-3.21 783948076667d+00 
8.96632406763084d-02 
1 .661 1 8776302647d-05 
9.63226781 51 9899d-02 
9.99S96866444939d-01 
0.00000000000000d+00 
0.00000000000000d+00 
0.00000000000000d+00 
0.00000000000000d+00 


Input matrix T : 


-2.3384380364961 3d-02 
1 ,22296689664696d-05 
-8.09744421 678703d-04 
-6.1 9868380965549d-04 
4. 1 766 1 8 1 3028933d-02 
9.9999996673871 6d-02 


State weighting matrix Q: 


6.1 9891 054286 1 66d+00 
1.31875722331 738d+00 
3.01 609191 580769d-04 
2.60488981 1 63273d-05 
-1 .494531 04023977d+00 
-3.9 1 3 1 51 1 4446321 d-0 1 
1 .31 875722331 738d+00 
2.71 739588981 900d+01 
-3.31 37087051 8672d+00 
-6.29472742893936d-01 
-1 .391 74639302283d+00 
-1 ,68559768886560d-01 


6.9336541 8621 274d-05 
-8.61 683445782787d-01 
9.98658866653035d-01 
7.07041 304940429d-02 
-8.47007233273550d-06 
-5,823753461 84306d-02 
2.93704935200758d-06 
-6.2974891 1 536677d-02 
0.00000000000000d+00 
3.31 194227548261 d-01 
0.00000000000000d+00 
0.00000000000000d+00 


3.01 6091 91 580769d-04 
-3.3137087051 8672d+00 
2.47848377796048d-07 
-1 .83140582461 534d-05 
2.26275446201 938d-03 
3.009833800271 26d+00 
2.60488981 153273d-05 
-6.29472742893936d-01 
-1 .831 46582461 534d-05 
2.37776667424236d+00 
3.31 4761 3501 2857d-05 
1 .1 54401 1 4387846d+00 


-8.0789776947421 5d-01 
-6.3569869384401 3d-01 
-1 .2623641 7743289d+00 
1 .33003345781691 d-02 
8.45367360609777d-01 
-2.82235577782270d-02 
-4.20099682385268d-02 
-2.3384380364961 3d-02 
0.00000000000000d+00 
6.98805772451 740d-01 
0.00000000000000d+00 
1 .00000000000000d+00 


-1. 494531 04023977d+00 
-1 ,391 74639302283d+00 
2.26275446201 938d-03 
3.31 476 1 360 1 2857d-05 
2.4988640880271 5d+01 
1 .93726081 28886 7d+00 
-3.9131611 4440321 d-01 
-1 .68550?$88SBS5Od-O1 
3.000883800271 £64+00 
1 . 1 ©4401 1 43878464*00 
1 .9372608 1 2688674+00 
6.661 228630602264-01 
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Cross-weighting matrix M: 


-3.63295371 81 7073d-02 
1 .981 00358396966d-06 
6.491 43026809568d-02 
-3.60993906625930d-03 
3.1 81 90737871 274d-02 
2.0359 167166410 7d-02 


Control weighting matrix R: 
626266793029727d+00 


Output matrix L: 


1 .00000000000000d+00 
1 .00000000000000d+00 


3.09 1 00000000000d-03 
3.69200000000000d+00 


3.1 2800000000000d+0 1 
0.00000000000000d+00 


State driving noise covariance matrix 8^: 


4,3367691 51 74706d-08 
1.0786841 6889601 d-09 
-1 .04991 1 6764951 3d-09 
-6.69997525494007d-1 1 
2.0031 479601 7624d-09 
8.472004061 01 625d-1 1 
1 .6765841 6889601 d-09 
7.2502046438591 6d-1 1 
-3.93401 1 1 7854579d-08 
-1 .20680507935355d-09 
-2.33843803650305d-Cd 
-6.1 9868380959793d-1 0 


-1 .04991 1 6764951 3d-09 
-3.93401 1 1 7854579d-08 
6.1 3454985323572d-1 1 
3.46061 668931 529d-1 0 
-6.401 59235541 700d-1 1 
-1.61 81 421 4531 290d-09 
-5.69997525494007d-1 1 
-1 .2068060 7935355d-09 
3.46061 668931 529d-1 0 
6.93058700394746d-08 
1.22296689879342d-1 1 
4.1 766181 3028401 d-08 


2.0031 479601 7624d-09 
-2.33843803650305d-08 
-6.401 59236641 700d-1 1 
1 .22296689879342d-1 1 
9.941 91 35995761 9d-1 1 
-8.09744421 672949d-1 0 
8.^72004061 01 625d-1 1 
-6.1 9868380959793d-1 0 
-1 .6181421 4531 290d-09 
4.1 7661813028401 d-08 
-8.09744421 672949d-1 0 
9.999999567387 1 6d-08 


Measurement noise covariance matrix 0g: 
1 .8441 251 0991 842d-02 
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Regulator gains G: (Also as computed in Chapter 2) 


- 7. 648593 58896862d-01 
-1 .691 56608507296d+00 


Filter gains K: 


6.30001 21 3606085d-03 
-2.0641 68336431 28d-01 
4.01 1 97820069 1 73d-03 
7,47232808640808d-03 
-2.1 794894fl924278d-03 
-2.1 7948949924279d-03 


-3.38674676832647d-04 
1 .04707683664762d+00 


2.45637909052670(1+00 
5.1 0491 1 1 4697691 d+00 


The following tables present the data defining the 10 scaled structures 
analyzed in Chapters 5 and 6, and the optimized structure discussed in Chapter 
8. Note that only the non-zero entries of the individual ¥ matrices are shown. 
For all the structures the output node scaling parameter p equals 
0.02199717628337. 
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Structure (a) 


Direct Form II 

Number of Precedence Levels: 2 

Number of Coefficients in Scaled Structure: 13 

(non-zero, non-unity entries in the modified state space matrices) 


Non-zero entries In Skg, 

Matrix 

Index 

Value 

*2 

(7,1) 

-2316.696730196619 

i f 

(7,2) 

17216.30907463747 

II 

(7,3) 

-46638.88776849179 


(7,4) 

60049.21454042769 

II 

(7,6) 

-37783.02361099942 

II 

(7,6) 

9373.006832979322 

II 

(1,1) 

1.0 


(2,2) 

1.0 

II 

(3,3) 

1.0 

II 

(4,4), 

1.0 

II 

(6,6) 

1.0 

II 

(6,6) 

1.0 

*1 

(6,1) 

-0.11903082227744 


(6,2) 

1.09870649812723 

II 

(6,3) 

-3.98894899287426 

II 

(6,4) 

7.49594995996606 

II 

(6,6) 

-7.82430422984935 

II 

(6,6) 

4.33762716269116 

II 

(6,8) 

0.00010128626129 

II 

(1,2) 

1.0 

II 

(2,3) 

1.0 

It 

(3,4) 

1.0 

II 

(4,6) 

1.0 

II 

(6,6) 

1.0 
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Structure (b) 

Parallel Direct Form II, 4 first-order and 1 second-order sections 

Ntltnber of Precedence Levels; 2 

Number of Coefficients in Scaled Structure: 1 7 

(non-zero, non-unity entries In the modified state space matrices) 


Non-zero entries in 

I Matrix Index 

Value 


(7,6) 

0.03890104412969 

(7,6) 

1.15283628631438 

II 

(7,4) 

0,13876077276467 

II 

(7,3) 

-0.00460563493139 

II 

(7,2) 

0.52228239125602 

II 

(7,1) 

-1.37949754700868 

•1 

(1,1) 

1.0 

II 

(2,2) 

1.0 

II 

(3,3) 

1.0 

It 

(4,4) 

1.0 

It 

(6,6) 

1.0 

N 

(6,6) 

1.0 

*1 

(2,2) 

1.462970474891 18 


(2,1) 

(6,8) 

-0.69683507325690 

II 

0.8767378206849 7 

II 

(3,3) 

0.99868711357757 

If 

(6,8) 

0.84232806309622 

If 

(4,4) 

0.9951 409541 3908 

It 

(4,8) 

0.17364017081712 

ll 

(6,5) 

0.53903698597208 

II 

(3,8) 

0.15261498391194 

II 

(6,6) 

0.29179182411121 

ll 

(2,8) 

0.28980851508818 

ll 

(1,2) 

1.0 
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Structure (c) 


Parallel Direct Form II, 3 second-order sections 

Number of Precedence Levels: 2 

Number of Coefficients in Scaled Structure: 1 5 

(non-zero, non-unity entries in the modified state space matrices) 
Pole Pairing: (Refer to figure 6-7) 

z p 1 and z pA 

z p2 and z pQ 

z pS and z p6 ^ These are t,1e complex poles) 


Non-zero entries in 

Matrix 

Index 

Value 

** 

(7,6) 

10.48075627883454 

(7,5) 

-10.29571120349337 

M 

(7,3) 

-0.31185194361843 

I! 

(7,4) 

0.30767918685885 

II 

(7,2) 

0.52228239125501 

II 

(7,1) 

-1.37949754700866 

II 

(1,1) 

1.0 

II 

(2,2) 

1.0 

II 

(3,3) 

1.0 

II 

(4,4) 

1.0 

II 

(5,6) 

1.0 

II 

(6,6) 

1.0 

*1 

(2,2) 

1.46297047489119 

(2,1) 

-0.69683507325690 

II 

(4,3) 

-0.29140853484973 

II 

(4,4) 

1.29047873768878 


(6,5) 

-0.5861 7482824346 

II 

(6,6) 

1.58417794011116 

II 

(6,8) 

0.07295197592120 

II 

(4,8) 

0.10856479467707 

II 

(2,8) 

0.28980851606819 

11 

(1,2) 

1.0 

II 

(3,4) 

1.0 

II 

(6,6) 

1.0 
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Structure (d) 

Parallel Direct Form II, 3 second-order sections 

Number of Precedence Levels: 2 

Number of Coefficients in Scaled Structures: 15 

(non-zero, non-unity entries In the modified state space matrices) 
Pole Pairing: (Refer to figure 6-7) 

*p1 and z p2 

z p3 and z pA 

z p5 and z p Q (These are the complex poles) 


Non-zero entries in ¥ 2 , 

Matrix 

Index 

Value 

* 2 

(7,6) 

1.59834173340604 

ir 

(7,5) 

-0.48730146270494 

II 

(7,4) 

15.71737776482720 

II 

(7,3) 

-15.69841756881241 

II 

(7,2) 

0.52228239125470 

II 

(7,1) 

-1.37949754700784 

II 

(1,1) 

1.0 

II 

(2,2) 

1.0 


(3*3) 

1.0 

II 

(4,4) 

1.0 

II 

(6,5) 

1.0 

II 

(6,6) 

1.0 

*1 

(2,2) 

1.46297047489118 


(2,1) 

-0.69683607325690 

II 

(4,3) 

-0.99383444709201 

II 

(4,4) 

1.99382806771667 

II 

(6,5) 

-0.17187605879836 

II 

(6,6) 

0.88082861008328 

II 

(6,8) 

0.48463047627064 


(4,8) 

0.00148815020744 

II 

(2,8) 

0.28980851506826 

II 

(1,2) 

1.0 

II 

(3,4) 

1.0 

II 

(5,6) 

1.0 
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Structure (e) 


Parallel, One-level Version of (c) 

Number of Precedence Levels: 1 

Number of Coefficients in Scaled Structure: 16 

(non-zero, non-unity entries in the modified state space matrices) 
Pole Pairing: same as (c) 



Non-zero 

entries in ¥.j 

Matrix 

Index 

Value 

*1 

(2,1) 

-0.696835073257 


(2,2) 

1.462970474891 

II 

(2,8) 

0.289808515068 

II 

(7,3) 

-0.089660341046 


(4,3) 

-0.291408534850 

II 

(4,4) 

1.290478737689 

II 

(4,8) 

0.108564794677 

II 

(7,4) 

0.085201505052 

II 

(6,6) 

-0.5861 7 4828243 

II 

(7,2) 

-0.615413829047 

II 

(6,6) 

1.58417794011 1 

11 

(6,8) 

0.072951975921 

II 

(7,1) 

-0.363944688371 

II 

(7,6) 

-6.143554925433 

II 

(7,6) 

6.307670104940 


(7,8) 

0.949356818741 

II 

(1,2) 

1.0 


(3,4) 

1.0 

II 

(5,6) 

1.0 
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Structure (f) 




Block Optimal Parallel 

Number of Precedence Levels: 1 

Number of Coefficients in Scaled Structure: 25 

(non-zero, non-unity entries In the modified state space matrices) 
Pole Pairing: same as (c) and (e) ' 


Non-zero entries in 

Matrix 

Index 

Value 


(2,1) 

-0.33647827003132 

(2,2) 

0.68249200666952 

II 

(2,8) 

0.65051552691033 

II 

(7,3) 

-0.08038901173235 

II 

(4,3) 

-0.20036428295682 

II 

(4,4) 

1.19946355533120 

II 

(4,8) 

0.11870639047793 

II 

(7,4) 

0.07597348041937 

II 

(6,5) 

0.19085044755223 

II 

(7,2) 

-0.43070856151277 

II 

(6,0) 

0,73170685139682 


(6,8) 

0.45237970547959 

II 

(7,1) 

-0.73947570074840 

II 

(7,5) 

-0.54742490834586 

II 

(7,6) 

0.94099544414975 

II 

(7,8) 

0.94935681874100 

II 

(1,1) 

0.78047846822148 

II 

(1,2) 

0.48789111196657 

II 

(3,3) 

0.09101518235780 

II 

(3,4) 

0.90953905526798 

II 

(6,5) 

0.85247108871418 

II 

(6,6) 

0.19692962981701 

II 

(1,8) 

-0.13770626781352 

II 

(3,8) 

-0.00287783447672 

II 

(6,8) 

-0.06296709412667 



242 . 


Appendix A: F8 Data 









Structure (g) 

Cascade Direct Form II, 3 second-order sections 

Number of Precedence Levels: 4 

Number of Coefficients in Scaled Structure: 15 

(non-zero, non-unity entries in the modified state space matrices) 
Pole and Zero Pairing: (Refer to figure 5-7) 

Section Is z p6 and z p& z 2l 

Section 2: z p & and z p ^, z z ^ and z z5 

Section 3: z p 1 and z pZ , z^ 2 and z z3 
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Non-zero entries in ^4, ¥g, ¥g> ^1 


Matrix Index 



Appendix A: F8 Data 




Structure (h) 

Cascade Direct Form II, 3 second-order sections 

Number of Precedence Levels: 4 

Number of Coefficients in Scaled Structure: 1 6 

(non-zero, non-unity entries in the modified state space matrices) 
Pole and Zero Pairing: (Refer to figure 6-7) 

Section 1 : z p 2 and z p Q> z Z 2 
Section 2: z pfi and z p6 , z zA and z z6 
Section 3: z p ^ and z pA , z z 1 and z z3 
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Non-zero entries In ^ 


Matrix 

Index 

Value 

♦4 

(7,6) 

-35.08378898367869 


(7,6) 

8.11873624443843 

II 

(7,7) 

26.98802315299709 

II 

(1,2) 

1.0 

II 

(2,1) 

1.0 

II 

(3,4) 

1.0 

II 

(4,3) 

1.0 

II 

(6,6) 

1.0 

II 

(6,7) 

1.0 

♦3 

(7,6) 

1.29047873768878 


(7,5) 

-0.29140853484973 

II 

(7,4) 

-0.46738885908277 

II 

(7,3) 

0.22026204990314 

II 

(7,7) 

0.25932232326397 

II 

(1,1) 

1.0 

II 

(2,2) 

1.0 


(3,7) 

1.0 

II 

(4,4) 

1.0 

II 

(5,5) 

1.0 

II 

(6,6) 

1.0 

•* 

(7,3) 

1.46297047489118 

(7,2) 

-0.69683507325690 

II 

(7,1) 

-1.79860505553554 

II 

(7,6) 

1.85943686039663 

II 

(1,6) 

1.0 


(2,1) 

1.0 

II 

(3,2) 

1.0 

II 

(4,3) 

1.0 

II 

(6,4) 

1.0 

II 

(6,5) 

1.0 

*1 

(6,2) 

1.58417794011116 

(6,1) 

-0.68617482824346 

It 

(6,8) 

0.07295197611457 

It 

(1,2) 

1.0 

II 

(2,3) 

1.0 

If 

(3,4) 

1.0 

II 

(4,5) 

1.0 

II 

(6.6) 

1.0 
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Structure (i) 


Cascade Direct Form I, 3 second-order sections 

Number of Precedence Levels: 3 

Number of Coefficients in Scaled Structure: 14 

(non-zero, non-unity entries In the modified state space matrices) 
Pole and Zero Pairing: same as (g) 


Non-zero entries in ¥g, Sfr 2 » 

Matrix 

Index 

Value 

*3 

(7,3) 

-320.6463446770277 

II® 

(7,2) 

167.6620966439277 

II 

(7,6) 

0.88082861008329 

II 

(7,4) 

-0.17187606879836 

II 

(7,8) 

163.0897474939010 

II 

(1,6) 

1.0 

II 

(2,1) 

1.0 

II 

(3,7) 

1.0 

II 

(4,3) 

1.0 

II 

(6,8) 

1.0 

II 

(6,6) 

1.0 


(8,2) 

-0.02695431628362 

(8,1) 

0.01249866671420 

II 

(8,4) 

1.99382806771669 

II 

(8,3) 

-0.99383444709202 

II 

(8,8) 

0.01471612360640 

II 

(1,2) 

1.0 

II 

(2,3) 

1.0 

II 

(3,4) 

1.0 

II 

(4,6) 

1.0 

II 

(6,6) 

1.0 

II 

(6,7) 

1.0 

II 

(7,8) 

1.0 

*1 

II 

(8,1) 

-0.11914766720871 

(8,8) 

0.39558416566143 

II 

(8,3) 

1.46287047489118 

II 

(8,2) 

-0.69683507326690 

II 

(1,2) 

1.0 

II 

(2,3) 

1.0 

II 

(3,4) 

1.0 

II 

(4,6) 

1.0 

II 

(6,6) 

1.0 

II 

(6,7) 

1.0 

II 

(7.8) 

1.0 
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Structure (j) 


Simple 

Niimbfer of Precedence Levels: 3 

Number of Coefficients in Scaled Structure: SO 

(non-zero, non-unity entries in the modified state space matrices) 


Non-zero entries in Wq, 

Matrix 

Index 

Value 

♦3 

(7,1) 

0.79382319292953 


(7,2) 

0.13324583104339 

it 

(7,3) 

-1.28133934418680 

II 

(7,4) 

1 .63323383955448 

II 

(7,6) 

-0.22354700928633 

II 

(7,6) 

-1.07427890614435 

II 

(1,1) 

1.0 

II 

(2,2) 

1.0 

II 

(3,3) 

1.0 

II 

(4,4) 

1.0 

II 

(6,6) 

1.0 

II 

(8,6) 

1.0 


(1,1) 

0.89415889987584 

(1,2) 

0.02219872745941 

II 

(1,3) 

-0.40090758959435 

II 

(1,4) 

-0.00008683035986 


(1,5) 

-0.17493645514225 

II 

(1,6) 

-0.12721034794726 

II 

(2,1) 

-0.00059340804849 


(2,2) 

0.99865886665303 

II 

(2,3) 

-0,00167440057433 

II 

(2,4) 

-0.00789688283429 

II 

(2,5) 

0.00003836756094 


(2,6) 

0.0000071 1410910 


(3,1) 

0.18063685658836 

II 

(3,2) 

-0.006385757421 11 

II 

(3,3) 

0.84535736060977 

II 

(3,4) 

0.00002869993983 

II 

(3,6) 

-0.02382581059578 

II 

(3,6) 

-0.01138138229376 
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■ r 

■M 


Matrix 

Index 

Value 

V 

(4,1) 

0,10382245521832 

(4,2) 

0.00119679543943 

II 

(4,3) 

-0.02270574399998 

II 

(4,4) 

0.99999686644492 

II 

(4,5) 

-0.01171380959155 

II 

(4,0) 

-0.00509674027466 

II 

(5,5) 

0.30119422754825 

II 

(6,6) 

0.68880300468145 

II 

(1,8) 

0.26341237237753 

II 

(2,8) 

-0.02208649707703 

II 

(3,8) 

0.32520493273496 

II 

(4,8) 

0.32736911273784 

II 

(6,8) 

-0.43182585076519 

11 

(6,8) 

-0.43809680729855 

II 

(1,7) 

-0.02223658684666 

II 

(2,7) 

0.00000003108449 

II 

(3,7) 

-0.00155168069308 

II 

(4,7) 

-0.00064200329831 

II 

(6,7) 

0.19562955072811 

II 

(6,7) 

0.47519418802086 

II 

(6,6) 

1.0 

*1 

II 

(3,2) 

-0.02874919859251 

(8,1) 

-0.02486071260568 

ii 

(8,3) 

-0.38589414084909 

II 

(8,4) 

-0.02282538209825 

II 

(8,5) 

-0.01812935994315 

II 

(8,8) 

1.07470407782701 

II 

(1,1) 

1.0 

It 

(2,2) 

1.0 

II 

(3,3) 

1.0 

It 

(4,4) 

1.0 

II 

(6,6) 

1.0 

II 

(6,6) 

1 .0 

II 

(7,7) 

1.0 
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Optimized Structure Considered in Chapter 8 
(Based on Structure (c» 

Number of Precedence Levels: 2 

Number of Coefficient In Scaled Structure: 1 7 

(non-zero, non-unity entries In the modified state space matrices) 


Non-zero entries in 

Matrix 

Index 

Value 

*2 

(7,6) 

1.34436168286127 

i f 

(7,6) 

-0.50777452620114 

II 

(7,3) 

-0.31185194361846 

II 

(7,4) 

0.30767918685888 

II 

(7,2) 

0.52228239125601 

II 

(7,1) 

-1.37949754700866 

II 

(1,1) 

1.0 

II 

(2,2) 

1.0 

11 

(3,3) 

1.0 

It 

(4,4) 

1.0 

II 

(6,6) 

1.0 

II 

(6,6) 

1.0 

*1 

(2,2) 

1.46297047489118 


(2,1) 

-0.69683507325689 

II 

(4,3) 

-0.291 40853484973 

II 

(4,4) 

1.29047873768877 

II 

(6,6) 

0.16466105298259 

M 

(6,6) 

0.65028182128291 

II 

(6,8) 

0.56874389081 747 

II 

(4,8) 

0.10856479467706 

II 

(2,8) 

0.28980851506819 

II 

(6,6) 

0.93389611882824 

II 

(6,6) 

0.12826858819766 

II 

(1,2) 

1.0 

II 

(3,4) 

1.0 
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Appendix B: The Adjoint Lyapunov Operator 


If we take the trace of the product of two matrices to be an inner product 
on the space of matrices, and ir to be a matrix operator, then: 

trace (ir(X) U) « trace (X ir*((Z)) (B.1) 

where r" is the adjoint operator of r. For r(X) - X-AXA*, the operator ** can 
be derived from (B.1): 

trace ( (X-AXA*) U) m trace (XU) - trace (AXA'U) 
m trace (XU) - trace (XA'UA) 

- trace (X(U-A'UA)) (B.2) 

Thus ir *(t/) - U-AfUA. 

As used in section 5.6, the Lyapunov equation (6.3 7) and the trace (5.38) 
were replaced by the equivalent equations (6.39) and (6.40). Relating this to 
the derivation above: 

X -1/ 

A-*H 

(B.3) 

w*(u) - n 

A 2 

*(*)-— r a ! s T 2 
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Appendix C: A Simplified Evaluation of (6.23) 


In this appendix we will derive the expression used In the SWL and MSWL 
algorithms for computing the second partial derivatives of J. Evaluating this ex- 
pression will be simpler than directly computing (6.22) and (6.2b). Using (6.22) 
and the expressions in (6.26) and (6.26), and defining the following matrices: 


* 


n 


Vk 


da 


D ^'• 



(C.1) 


Lk 


ad 




(C.2) 


we can rewrite X U ' 



0 

0 


0 

a*, 

dc j i 



+ 


o 

0 


0 

a*. 


9c 


/ J 




0 0 


0 0 ■ 

+ 

a** 

n 

(d,zd,'*o 2 ) 

a*.' 

r\ 


o 

!<o 

> 


dc j 
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3ft 


a 2 ** 

3Cy3Cy 


(°1 Z0 l'*°2) 


n 

0 9' 
00 


(C.3) 


Thus Xjj will be a matrix whose lower right-hand (n+l)x(n+2) portion is non-zero, 
and the rest zero. Thus the trace expression in (6.23) can be simplified: 


a2j 

3c, 3 Cj 


- 2 trace 


9¥„ 3¥_ 

- '(rtf 1 2) + 

OC j oc ^ 0C ^ 


(/W1)^(/»f2) 

acj 


+ 2 trace 


3V 


3c 


/ 


39. 

(W3)— (M4) + 


3c, 3c, 


(rtf 5) 


(C.4) 


where rtfl, rtf 2, rtf 3, and rtf 4 are precomputed matrices (computed only once for 
all i and j) the fixed matrices 0^, Z, A, Og, 0 , and ¥ c# . As it Is shown in (C.4), a 

maximum three matrix multiplications and a trace operation are required for each 
term in (C.4), for each i and j. Thus in terms of operation counts, the calculation 

of (6.23) would be roughly proportional to (W 2 ) (2n+1) 3 . 

In fact, this expression can be further simplified to reduce the computation- 

3* 3* 


al load. By substituting (6.27) and (6.28) into the partial derivatives 


3c, ’ 3c j ' 


a2 *co 

and - — r — , applying simple trace identities, and combining the matrices V-', 

OCy OCj 1 <■ 

9 q ■ • • 9 with rtfl, rtf 2, rtf 3, rtf 4, and rtf 6, we can produce: 
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5^--2tr.ce 


+ 2 trace |(rtf8)f rs (rtf 9)|^ 


9 


+ 2 trace |f M (At 1 0 )E fS (rtf 1 1 ) j 


+ 2 trace |f^ s (rtf 1 2) j if / B r 


(C.6) 


where the precomputable matrices rtf 7, rtf 8, rtf 9, rtf 10, rtf 11, and rtf 12 will 
depend on which specific precedence-level matrices contain coefficients c^ and 

Cy. As the number of precedence levels goes up, so does the number of such 

matrices — but they can still all be precomputed. Equation (C.5) can be 
simplified by taking advantage of the special form of E kl and E rs (described in 


section 6.4). For the first trace term of (C.5), we can write: 


(M6)E M (rtf 7) = (l/1) (1/2) (C.6) 

where 1/1 is' a column (2 /j+D- vector equal to the k* E column of rtf 6 and 1/2 is a 

row (2n+1)- vector equal to the row of rtf 7. Thus the first term of (C.5) can 
be written as: 


2 trace (1/1 ) (1/2) = 2 trace (1/2) (1/ 1 ) 


9Cy 


9c , 


2 (1/2) — (1/1) (C.7) 

OC y 


Now, only one vector-matrix multiplication and one vector dot product are required 
per / and j. In terms of operation counts, this simplification reduces the calcula- 
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tton of (6.23) (given the first partial derivatives of Z) from being roughly propor- 
tional to A/ 3 (2n+1) 3 to being proportional to N^(2n+ 1) 3 , a large savings. 

I w 

I The second term of (C.6) can be simplified in exactly the same manner as 

term 1. The third term, since there is no dependence on or Cj other than in 

// 

E U and E rs , we can reduce to: 

2 trace |f M (M 1 0)E rg (M 1 1 ) J - 2 Af1 0(/,r )M 1 1 (s,k ) (C.8) 

This Involves even less computation then the first two terms. Finally, the fourth 
term reduces to the simplest form of all: 

2 trace |f te (« 1 2) J - 2 M 1 2(s,k) (C.9) 

Thus overall, the number of operations Involved in computing this simplified expres- 
sion will be proportional to N 2 (2n+ 1) 2 where N Is the number of rounded 
coefficients In the structure, and n is the plant order, 
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